• QBIX Analytics

Data Lake vs Data Warehouse

The value of solid data to a business cannot even be quantified in today's competitive landscape. Collecting and using data is no longer an optional add-on, but a necessary way of running operations. There is a clear, positive correlation between the growth of a company and effective use of its data.

These last few years, the interest of companies in ‘big data’ has witnessed significant growth. It's essential that companies aren't blindly jumping on the big data horse - because, without a plan and strategy in place, all the data you collect will be useless.

One major question plaguing organizations is deciding between a data warehouse and a data lake. In order to make the right choice, both of these data assets should be well understood.

What Is A Data Lake?

Following the description of data lake by the man known to have coined the term, James Dixon: if data mart, which is a type of data warehouse, is thought of as bottled water: cleansed, packaged and structured for easy consumption. Conversely, data lake should be seen as a large body of water - left in its natural state.

Data lake refers to a system that houses a large amount of data in its natural form, until such a time when it is needed. This means that it does not need to be structured first. It accepts all data from source systems, and data requirement and schema are defined only when data is queried to fulfill the needs of a specific analysis.

What Is A Data Warehouse?

A data warehouse is a system utilized for data analysis and reporting and is considered to be a key component of business intelligence. It is a central repository that stores historical and current data in a place which can be used to create analytic reports for workers in an enterprise. It collects corporate information as well as data from external sources and operational systems. It is highly structured and transformed and will not load data until the purpose for it has been clearly defined.

Differences Between Data Lakes & Data Warehouses

Retention of Data: The way in which a data warehouse is developed makes it a highly structured reporting model. It requires decisions to be made on what data to include or not include. If the use of data is not defined or it does not answer specific questions, it may not be included in the warehouse. In contrast, data lake does not turn away data, as any information that has an unknown use today might be useful tomorrow.

Data Type Support: Generally, data found in a data warehouse will consist of those from transactional systems, quantitative metrics, and attributes used in describing them. Data sources that are non-traditional such as sensor data, text and images, web server logs, and social network activities are ignored. In contrast, data lake embraces all data types including non-traditional ones.

User Support: Although a data warehouse caters to ‘operational’ users that often make up 80% of most companies, it does not fully cater for the next 10% that carry out analysis on data. Plus, the remaining 10%  may be totally ignored - this group of people can include data scientists who carry out deep data analysis. Data lake, however, gives equal support to all users.

Resource Consumption: It takes time and consumes developmental resources to make changes in a data warehouse. The data loading process is complex and a lot of business questions cannot wait that long to be answered. With data lake, by applying a more formal schema, business questions can be answered at the user's pace.

Gathering Insight: Data lake enables users to get results faster when compared to a data warehouse, as it accommodates all data and data types, and allows access to the information before it is transformed.

Choosing Between Data Lake & Data Warehouse

It is important to understand your data needs before selecting between a data lake and a data warehouse. You need to know what type of data will be stored, your data sources, resources available to you, and what the data will be used for. This knowledge will help guide you in your decision.  If you're just starting out, you want to weigh the pros and cons and align your business needs to each side's potential.