How a modern data platform enables automated transparency

Nov 03, 2022
  • IT
  • SAP
  • Microsoft Azure
  • data

In recent years, ‘data transparency’ has gained a lot of attention. And it’s not hard to see why: with data playing a decisive role in our lives, we demand to know where information comes from and whether we can trust it. Organizations need to keep a clear overview of all the data they’re collecting as well: not just for compliance, but also to tap into its full potential. Here's where a modern data platform can make all the difference.    

The ‘accountability principle’ of GDPR (Article 5.2) states that data controllers “must be able to demonstrate that personal data is processed in a transparent manner” from the point of data collection onward. And that’s just one example: over the last few years, numerous laws and regulations have been introduced in which data transparency plays a key role. 

But data transparency goes beyond legal issues. Knowing what data is available and how reliable it is is essential for your data strategy. It’s also a prerequisite for a healthy data economy, where data is no longer restricted to internal use but can be shared with business partners across the supply chain. Last but not least: your users and your customers increasingly demand it.

learn more about the data economy & how to participate

Controlling the data traffic

Before data can be ‘made transparent’, however, it first needs to be streamlined and collected in a clear and consistent way. “The role of a data platform is to enable the gathering of insights from data and helping users to make better-informed decisions. How? By collecting and harmonizing data from various sources within the company or organization(s),” explains delaware’s data platform lead Sebastiaan Leysen. “This includes structured as well as unstructured data, big data, small data sets, etc. Additionally, the platform should enable individual applications within or outside an organization to communicate in real time.”

Sebastiaan uses the analogy of an air traffic control tower: “You could compare an organization’s business applications – like ERP, CRM, HR-platforms, etc – to airplanes. All of them communicate with the control tower in near real-time, often via an event-driven paradigm, to exchange necessary process information with each other. The ‘tower’ is the data platform that brokers the information among the airplanes, orchestrates data movements, validates incoming data, monitors data flows, harmonizes and consolidates data streams and serves the data to other, internal and/or external parties.” 

illustration of a control tower as a data platform

Single source of truth, many use cases

In our vision of a modern data platform, all of this would happen automatically. “When a request is registered in the organization’s CRM, for example, all the applications interested in this event would get notified in near real-time.,” Sebastiaan explains. “ In addition each event would subsequently be channeled into a central data store (on Azure or SAP, for example), where it would feed a canonical data model implemented using tools like Databricks, Azure Synapse, or SAP Datasphere. The end result is a ‘single source of truth’ that facilitates reliable decision making.” 

This centrally curated information can then be fed into a variety of use cases, like, for example, a B2B or B2C customer portal. “Ideally, everything you see on such a portal would be generated by the platform and kept in sync automatically, based on the events and data from other systems. Organizations could even implement specific dissemination rules to control which information is disclosed. In such a design, no one would need to manually ‘publish’ anything on the portal – it would all be done automatically and according to pre-defined rules.” 

Data lake house architecture

The core of most modern data platforms is a data lake house architecture. Sebastiaan: “This setup combines the best features of a data warehouse with those of a data lake. This implies you organize and curate your storage in logical zones, while also enjoying the flexibility to work with any data variety (format), volume (small or big) and velocity (batch or real-time processing) imaginable. Want to extract text from a PDF or merge .csv files? No problem at all.”

In line with this architecture, other applications in the organization’s IT landscape could also directly store some of their data on the data lake, where it could be further processed for downstream consumption. Often, the platform itself is surrounded by ‘data marts’: data bases with subsets of data fit for specific purposes and use cases. “This ‘data curation’ is needed to prevent your data lake from turning into a data swamp.”

Canonized data

Extracting so-called ‘curated data’ would require only a few lines of code. “The ‘canonical data model’ focuses heavily on reusability,” Sebastiaan continues. “In this way, we ensure that each transformation needs to be defined only once, and that the resulting curated data can be consumed as re-usable data products. This allows data engineers, scientists, analysts and managers to focus on added value, instead of overhead tasks like data orchestration, exportation, and creating lineages.”

In essence, there are three ‘stages’ of data:

  • Stage 1 – Raw: This is data in its native format, as received from its source. It’s unfiltered and unpurified, before any transformation. It should be immutable and provided in a read-only format.
  • Stage 2 – Prepare: Data in this stage is validated, standardized and harmonized, and has a high level of reliability. It consists of re-usable building blocks for logical data models.  
  • Stage 3 – Serve: This data is ready to be consumed by other systems: it’s optimized for reading and customized for specific use cases. 

Don’t go it alone

“A solid, intelligent, modern data platform can take care of many data needs,” says Sebastiaan. “However, setting up such a platform requires a wide range of expertise and departments to come together and collaborate, which can be tricky in highly-structured, hierarchical organizations. Often, new applications have to be built from scratch as well. In any case, you’ll want to make sure you’re making the right decisions. Being able to count on a strategic partner with both the business experience and the technical skills to pull it off is a major advantage. That, and adhering to the ‘fail fast, learn faster’ principle, which means you start small to get relevant feedback on what works and what doesn’t quickly. In this way, you can build a platform that truly fits your specific needs.”  

Looking for ways to streamline dataflows in your organization and boost efficiency and transparency? Talk to our experts!

related content