The data lakehouse as your platform to the future

Apr 25, 2022
  • IT
  • data

To become a successful data-driven organisation, it is critical to set up a data platform that can handle data streams from across various sources and translate these raw information into actionable insights. We explore the 'Data Lakehouse' approach - a new approach to build data platform to harness the best of today's options. Learn more.

To become a successful data-driven organisation, it is critical to set up a data platform that can handle data streams from across various sources and translate these raw information into actionable insights.

These data platforms are traditionally built using either a data warehouse or a data lake approach. Businesses had to decide which option would serve their unique situation best. With the new data lakehouse approach, we can now combine the capabilities of both.

Taking a closer look at what is in a data management platform

First, let’s take a closer look at what constitutes a data (management) platform. The specifics will differ in every organisation, but broadly speaking, we can differentiate between 5 layers:

  1. Data sources: These are the internal or external sources of information that are not part of the data platform.
  2. Ingestion layer: Here, raw data is ingested and ‘unlocked’ within the data platform. This can happen three ways: in batches (pull), via streaming (push) or through replication.
  3. Raw data layer: A copy of the raw data is then stored in a data lake or data warehouse.
  4. Centrally processed data: Inside the data warehouse or data lake, data is then processed and prepared for further usage. While a data warehouse typically contains structured data (mainly for reporting purposes), a data lake is more fit for unstructured and big data (e.g., for data science purposes).
  5. Serve & consume: In this layer, processed data is analysed, reported on and/or distributed.

Breakdown of data (management) platform

Combining the best of both worlds

Imagine a warehouse stocked with well-organised components in neat rows and stacks. Now, think of a lake, full to the brim with water, fish and other objects all jumbled together with no immediate order imposed on them. It’s relatively straightforward to find and access a specific object located in a warehouse – while it requires different processes to identify and extract specific content from a lake.

Like their namesakes, data warehouses and data lakes differ quite profoundly in how they store and process what fills them: information.

Data warehouse

A data warehouse deals best with moderate amounts of structured data, which is used mainly in reporting and service delivery.

Data lake

A data lake is best at handling large amounts of raw and unstructured data, which is used mainly in data science, machine-learning exploration, and similar applications.

Do we need really need to make a choice between data lake or  data warehouse?

The main problem with this either/or approach? Today’s companies need to be able to handle all types data, and use them in all types of scenarios. In other words, having to choose between a data lake or warehouse is almost always a case of choosing the lesser evil. This is why many organisations now use both in tandem, leading to higher levels of complexity and duplicated data.

Enter the data lakehouse

Enter the data lakehouse: an open architecture that combines the best features of – you guessed it – data lakes and data warehouses, with increased efficiency and flexibility as a result. Made possible by the rising trend of open and standardised system design, data lakehouses can apply the structured approach of a warehouse to the wealth of data contained in a data lake.

The main features of a data lakehouse

  • Handle various data types: structured, unstructured and semi-structured
  • Enjoy simplified data governance and enforce data quality across the board
  • Get business intelligent support directly on the source data, which means that BI users and data scientists work from the same repository
  • Benefit from increased scalability in terms of users and data sizes
  • Get support for data science, machine learning, Structured Query Language (SQL), and analytics – all in one place

Unlocking innovation

By simplifying enterprise data infrastructure, safeguarding data quality and increasing the opportunities for exploratory data science, the data lakehouse holds the key to future innovation for many companies. Software vendors seem to agree: those with roots in either data warehouses or data lakes are making a lot of effort to create their own hybrid ‘data lakehouse’ solutions. As such, it is not necessarily a need to invest in two different technologies to get yourself a data lakehouse.

While many parties are claiming the term ‘data lakehouse’, it’s important to keep their histories in mind when making a decision. The key is to keep the big picture in mind, and find a solution that works on your terms and takes your data quality and data governance rules into account. At delaware, we combine expert knowledge of every available platform with business experience in numerous sectors, which makes us uniquely qualified help you pick the solution that best fits your needs.

Discover 6 fundamental principles of Data & Analytics

delaware - data & analytics