Over the last months, the development team of DataVaults has been working on the delivery of various technical tools that are altogether integrated under the DataVaults infrastructure, belonging either to the DataVaults Personal App or the the DataVaults Cloud-based Platform.
Development started from the definition of requirements and as usual progressed with the release of mockups that have been evaluated by early users. The next step was to transform these mockups into functional interfaces which will be used in the platform. In the case of DataVaults, the components to be delivered are grouped in “bundles” and those are forwarded for integration, following an agile approach and adhering to the product roadmap of DataVaults which foresees 4 different releases, namely Alpha, Beta, v0.5 and final.
One of the bundles which we will be presenting in a series of three blogposts is that of the “DataVaults Components for Data Sharing, Value Generation and Intelligence” bundle. This bundle consists of the following open-source components:
- The Data Fetcher and Transformation, which is tasked with the collection of data from individuals , and as such is a component housed in the DataVaults Personal App
- The Query Builder is a component operated by the Data Seekers that allows querying the Data Store of the cloud-based platform. It presents Data Seekers with a data catalogue and the contents of the data store (which they can access)
- The Secure Analytics Playground that is used by Data Seeker for designing and running analytics jobs based on the datasets they have acquired
- The Data Explorer allows a Data Seeker to browse their own Data Spaces, where the assets they have already bought reside. This is a component which is not depicted in the first version of the architecture, but is going to be part of the second version
- The SSE Engine, which is tasked with performing the necessary activities for encrypting and decrypting the various data based on Symmetric Searchable Encryption scheme
- The Edge Analytics Engine, which is a lightweight analytics implementation, offered to Data Owners to get insights on the data that reside in their Personal App
In this post we cover briefly the first 2 components of the list presented above, as worked for the Alpha and the Beta Release.
Data Fetcher and Transformation
The Data Fetcher & Transformer component utilizes a modular micro-service architecture. The fetcher module is able to periodically check an API which is selected by a user (out of a list of available APIs in the Personal App which can connect to external sources) to find data and collect them. The data then is transformed into the DataVaults metadata data model and exported into a MongoDB for further usage.
In its current form, it is possible to configure the schedule for a preconfigured data source and to start and stop the collection schedule. The Data Fetcher & Transformer does not have a user interface, as all configuration is done via the PersonalApp interface. The configurations for the Data Fetcher and Transformer regarding the user management will reside on the backend on a special multiuser scheme.
The Data Fetcher and Transformer is programmed in Java and Kotlin code. It uses Eclipse Vert.X as web framework. The resulting micro-services are based on piveau consus.
The Query Builder is the main facility used by Data Seekers for searching within the cloud-based platform to retrieve datasets, which are shared by the different data owners. In the backend, the Query Builder uses a Triple Store for Linked Data to store, retrieve and search information about the data that is retrieved via the frontend, while the core of the dataset is stored in a non-relational database.
This approach enables a hybrid solution that allows semantic search and interoperability of data at the metadata level, providing functionalities such as those of a data catalogue. At the same time the core datasets can be indexed (if required) and served in a faster and more efficient manner to the requested component, directly from the non-relational data store.
For this purpose, apart from the usual APIs to communicate with the non-relational data store, an additional micro-service will provide a RESTful API that translates frontend requests into SPARQL queries.
The frontend interface of the Query builder is based on VueJS to match it with the UX of the rest of the DataVaults cloud-based platform, and various operations that have to do with calling APIs from other components are handled by the frontend. In this respect, data retrieval is performed by the backend APIs and in turn communicating their results with the APIs of the Access Policy Engine, as access to the queried data shall be resolved by the Access Policy Engine, by utilizing as input information from the ledger. Moreover, this component will interact also with the SSE engine, to allow searching over encrypted data in the later releases of the platform. The backend of the Query Builder is written in Java and Kotlin and uses the Eclipse Vert.X web framework. For data storage, an OpenLink Virtuoso server is used. The Backend is based on piveau hub. The frontend of the Query builder is provided using the VueJS2 framework.
In the next blog post, we will be covering the Data Explorer and the Data Analytics Playground Component.