Value chain? What for?
Value chains are concepts applied to many industries and refer to the activities to be performed to deliver or maintain a product. What is the product in the case of DataVaults? Of course, it is data, and, more specifically, personal data. Therefore, we are talking about Personal Data Value Chains here, as the collection of activities to be carried out to deliver value out of personal data for a bunch of stakeholders, including the data owners and the data seekers, among others.
So, the desired outcome is value, and providing value to personal data is the underlying force of the DataVaults data value chain.
DataVaults & BDVA
As DataVaults is one of the projects funded under a topic related to the Big Data Value PPP, it is always good to piggyback on the so-called BDV Reference Model and try to position ourselves in some of the layers. Needless to say, DataVaults at first glance matches quite well with the vertical layer related to Data sharing platforms, Industrial/Personal as highlighted in the figure below.
But if we go a little deeper, the horizontal layers of the BDV Reference Model provide an almost intuitive way to map to a data value chain. However, the reference model includes other aspects such as particularization to the data types (in yellow), or vertical concerns, which makes more difficult a visualize a typical data value chain mapping. This is where the 4-step data pipeline developed in the sister project DataBench (also funded in a topic from the PPP) comes to our rescue.
DataVaults & DataBench: the sisterhood of the value chain
DataBench provides an intuitive version of a data value chain which has been adopted by several projects to map their own architectures to a data pipeline consisting of four typical steps, as represented in the figure below:
The first step is the Data Acquisition and Collection, where the actions related to data ingestion and processing for the different datasets are carried out. This is followed by the Data Storage/Preparation step, which includes all the activities related to data storage, access and retrieval, including processes for data protection, curation, integration and publication. Then the Data Analysis/AI/Machine Learning includes the activities and processes related to data analytics and AI, including aspects such as ML training, operation, reproducibility, etc. Last but not least, the Data Action/Interaction, Visualization/Access step refers to any interaction between the data-enabled system and the system boundaries (people and machines), such as techniques for data visualization, APIs, etc.
Within DataVaults deliverable D1.4, we made the mapping of the core data value chain and main elements of the DataVaults architecture, to the DataBench generic pipeline. This can be seen in the figure below:
For this mapping, we used the 1st version of the DataVaults architecture diagram and mapped it to the data pipeline by using the following colour code:
- Red for activities that correspond to the first step of the pipeline, e.g., Data Acquisition/Collection.
- Yellow for activities that correspond to the second step, e.g., Data Storage/preparation.
- Green for activities that correspond to the third step, e.g., Analytics/AI/Machine Learning.
- Blue for activities that correspond to the fourth and final step of Action/Interaction/Visualisation/Access.
In DataVaults we also went a step further by providing an initial mapping to the DataBench blueprint, which is dividing each of the 4 steps into several operational building blocks typical of big data and AI systems. An example of this operational mapping can be seen in the figure below using the same colour code for the components of DataVaults explained above:
Conclusion
This definition of value chain presented in this post is further explained in the project deliverables D1.1 and D1.4. This exercise does not only help to adapt the project architecture to a generic definition of the value chain, as the one provided in the scope of the Big Data Value PPP, but also allows the possibility of mapping in a standardized way the data value chains of any project, and therefore make the results comparable. Moreover, the position of the different elements of the architecture in the operational blueprint helps to understand the dynamics of the system and can be of help to perform the integration and benchmark of the system. A win-win sisterhood, isn’t it?