Data modeling is a technique of organizing data in a logical pattern that allows for the efficient storage, retrieval, and manipulation of data within a data warehouse.
The design and construction of the logical structure of data in a database includes tables, columns, and relationships between the data. There are several different data modeling approaches, each with its own set of advantages and disadvantages. It is worth noting that these methodologies are not mutually exclusive, and each project may use multiple variations of approaches. The choice of which to use depends on the specific requirements and constraints of the organization. Some of the most common data modeling approaches include:
Conceptual data modeling: focuses on identifying the main entities and relationships within a system, and is used to create high-level views of the data.
Logical data modeling: focuses on creating detailed representations of the data structures and relationships, regardless of the choice of technologies or database management systems.
Physical data modeling: focuses on creating detailed representations of the data structures and relationships, while taking into account the specific technologies and database management systems that will be used to implement the model.
NoSQL data modeling: optimized for scalability and high performance in non-relational databases. It is modeled with different data structures, including documents, key-values, and graphs, mainly used for big data and real-time analytics.
Dimensional data modeling: optimized for read performance. It is typically modeled in a star or snowflake schema with a coarse-grained granularity, mainly dealing with slowly changing dimensions, and used for reporting and analytics.
Relational data modeling: optimized for write performance. It is typically modeled in third normal form (3NF), with fine-grained granularity, mainly dealing with slowly changing facts, and used for transactional systems.
Data vault modeling: optimized for scalability and change. It is modeled in a hub-and-spoke structure, with hybrid granularity, mainly dealing with full history of data, and used for reporting and analytics.
In a data warehouse, it is recommended to implement a central core layer. Its purpose is the integration of data from multiple source systems and the historization of the data. The core is the only source for the data marts that are used for BI applications such as reporting platforms or OLAP tools.
Data marts are usually designed as a dimensional data model with dimensions and facts. This model is either implemented as a star schema or as a multidimensional cube. But what kind of data model is appropriate for the core layer? There is no best approach that suits for all analytical data solutions, but different modeling techniques are commonly used, each has its advantages depending on the requirements and the modeling strategy:
Dimensional data modeling, also known as the Star Schema / Kimball approach, has been proven to be adapted to reporting needs by modelling the data in a optimized way for user queries over the last two decades. It is a set of techniques and concepts proposed by Ralph Kimball, and is widely used in data warehouse design.
The architecture contains the following main layers:
A relational data warehouse is based on the source data (source system or ODS based on source system) and transforms the data in a relational schema.
Relational data modeling is usually used for operational systems (OLTP systems) that store their data on relational databases. But it can only be used for Data Warehouses – most structured DWH databases are implemented with relational database systems.
The relational data warehouse approach differs from the star schema / Kimball approach in the following aspects:
Data vault modeling, invented by Dan Linstedt in 1990, is used to model the enterprise data warehouse Core layer. This approach is suitable for multi-source environments needing a fast adaptation to changes. The model consists of a complete denormalization of three base constructs into separate entities:
Data Vault should store "all data, any time", and the data should not be cleansed. A data vault data warehouse is based on the source data (source system or ODS based on source system) and transforms the data in a data vault modeling (hubs, links, satellites in a raw vault area and a business vault area).
Each data modeling approach has its own set of advantages and disadvantages. The decision of which one to use depends on the specific requirements and constraints of your organization:
Data warehouse, data lake, or data lakehouse?
Advanced data automation for data vault modeling
Want to learn more about the basics of business intelligence (BI)? Check out our free e-Learning course.
Accelerate and automate your analytical data workflow with comprehensive features that biGENIUS-X offers.