Relational modeling

Relational data modeling is usually used for operational systems (OLTP systems) that store their data on relational databases, and also used for structured analytical data solutions.

Curved lines on a blue background.

Modeling Strategy

Relational data modeling is often used for analytical data solutions when user requirements for reporting, or other business intelligence (BI) applications are not yet defined, or when the same information is needed for different purposes. For example, if a particular data set is used as a dimension in a dimensional model, the same data set can be used as facts in another data mart. The relational model gives you more flexibility in using the same data in different data marts,  known business needs should be considered whenever possible.

ETL process

The complexity of the ETL process depends on the transformation rules between the source system and the core data model. If the core model is very similar to the data model of the source system, which is usually the case when only one main source system is used, and the transformation is usually straightforward. The level of complexity increases when multiple source systems need to be integrated into a common core data model.

The ETL process for versioning master data is not trivial, but it is easy to implement because the logic is always the same regardless of the data content. They can be easily generated or implemented with the help of a suitable ETL tool. An established method is the implementation of data integration transformations between staging and cleansing areas. In this case, the structure of the cleansing area is the same as the core data model, but without historization. The ETL process between the cleansing area and the core is used only for the versioning of master data and assigning surrogate keys.

Historization

To allow flexible loading of the Slowly Changing Dimensions (SCD) into data marts, the master data (the descriptive data in dimensions) should be versioned in the core data model. Data versioning can be achieved with SCD2 the same way as in the dimensional model, but relationships between entities can create numerous versions when modifying existing data in the referenced entity. The concept of master data versioning allows us to avoid this ripple effect. Two tables are defined for each entity in the relational data model:

  • The head table contains the business key or source system primary key, and all attributes that cannot be changed during the life cycle or are only relevant for the current state (SCD1). It can also be relationships with other entities, such as contract-to-customer relationships.
  • The version table contains all attributes and relationships that are subject to change during their lifecycle, and require change history in at least one data mart (SCD2). Each version has a valid start to end range, and references entities in the corresponding head table.

A version table is defined only if at least one attribute or relationship requires change tracking. This does not apply to addresses (we are only interested in the current address) and incidents (incidents occur once and never change).

To store independent change history for each entity, the head table and the version table must be separated. Foreign keys can be static (stored in the head table) or dynamic (stored in the version table), but they always refer to the corresponding entity's head table, not to a specific version. This allows a high degree of flexibility in historization and prevents the previously mentioned ripple effects.

An important difference between master data versioning and SCD2 is the behavior of key references. References to SCD2 dimensions always refer to the specific version in effect at the time of the event (or load) of the fact. Relationships between entities in the relational model are associated with entity keys (primary keys of head tables) rather than specific versions. The version key is implemented for technical reasons only and is never referenced. Which version is relevant is determined at the time of query or when the data mart is loaded.

Further reading

What is data modeling?

Machen Sie Ihre Daten zukunftsfähig –
mit biGENIUS-X.

Beschleunigen und automatisieren Sie Ihren analytischen Datenworkflow mithilfe der vielseitigen Features von biGENIUS-X.