DataVaults background 3

Capturing the meaning of data

DataVaults aims to deliver a solution for collecting, storing, managing, sharing, analysing and monetizing personal data and its derivatives, such as the results of data analysis and visualisation. Furthermore, data queries, analysis and experimentation in DataVaults will allow the linking and merging of data from various sources and the combination of those with personal data, based on the DataVaults core data model. These activities, which rely on the semantic annotation of data and the curation of those to make them findable, linkable and improve their quality, will raise the economic value of both personal and of other kind of data, as more detailed and interesting insights will be generated.

The DataVaults data model needs to describe a holistic Personal Data Value Chain addressing all the aspects of personal data management. This includes data protection, security, GDPR compliance, IPR management (compensation schemes etc.) and representation of the main value flows in data marketplaces, based on existing standards.

The DataVaults Core Semantic Data Model

In D1.2 – The DataVaults Core Semantic/Data Model, the project Consortium defines the DataVaults data model and the lifecycle of the data within DataVaults.

The DataVaults data model is based on the semantic web technology using the resource description framework (RDF), and identifies related ontologies, concepts and vocabularies. It is specified as a profile of the general data catalogue vocabulary (DCAT), which is an RDF vocabulary designed to facilitate interoperability between data catalogues published on the Web. Due to the open nature of RDF, the data model can be extended without breaking the system or APIs under development.

Currently, the DataVaults data model incorporates data from 6 different sections, in order to cover the various aspects of data handling in a personal data management and monetisation framework. More specifically, it models basic profile data, location data and domain-specific data from three sectors, namely healthcare, social & activity, smart home & energy. The domains have been selected based on the demonstrators’ fields of expertise and activity, while the chosen properties follow not only the demonstrators’ requirements, but also include general properties that will cover other scenarios and stakeholders from other domains in the future.

The DataVaults Data Lifecycle

Apart from the data model, this deliverable elaborates also on the lifecycle of data as part of DataVaults. The DataVaults Data Lifecycle is based on the DataVaults Methodology and high-level usage scenarios, that provide a detailed description of how users will interact with one another and with data within DataVaults. As a result, the Lifecycle contains workflows that cover not only the management of the personal data, in terms of data collection from various sources, data cleaning, transformation, linking, indexing and storage in the DataVaults Ecosystem, but also includes the workflows for executing data analytics and ccompensating Individuals for data assets they have shared with other parties.

Figure 1 The DataVaults Data Lifecycle

The DataVaults Data Model and Data Lifecycle are the backbone of the DataVaults Personal Data Platform, that will facilitate the harmonisation, enrichment and management of data that are collected from various and diverse sources, in order to create real value for its users.