For this type of dimension the version history is tracked in a separate table. e.g.
|ID||Version||Name||Salary||H / W / A||Valid From|
|ID||Version||Name||Salary||H / W / A||Valid From||Valid To|
You can see that Joe Bloggs’ salary has remained unchanged and has no records in the history table. However, Jane’s salary has changed three times and the three old records are in the history table.
This can allow for minimal logging or even “as at” reporting. For example, if the fact table holds the “ID” and “Version”. The ID can be used as a foreign key to the main to give an “as is” result, or the “ID” and “Version” can be used as a foreign key to changed history table to give the “as at” result. Note that when a fact is first inserted it will acquire the current version of the dim.
With this understanding, here is how I go about populating and maintaining the two tables associated. For the example this following will happen
- Stage records that contain Marvel Super Heros (Name, Real First Name, Real Surname)
- Record Hero’s Name, First Name and Surname into a warehouse table
- Maintain history of changes that could be referenced
Create all of the tables and schemaa
CREATE Schema Staging;
- Staging.MarvelHero : Holds that entire dataset of the external dimension coming in
- Warehouse.dim_MarvelHero : Main dimension table with the warehouse. Additional Columns:
- ID : Primary Key;
- Version : Each time there is a change this will be incremented by one;
- ValidFrom : A date to represent when this version became valid
- ValidTo : If the record is soft deleted this will record the deleted date
- isDeleted : Records if the record is soft deleted
- Warehouse.dim_MarvelHero__history : As above but storing all version and not just the current one
Load up the staging table:
INSERT INTO Staging.MarvelHero
I’ve created a typo as an example of a correction that will result in two versions by the end.
Now for the stored procedure that will handle populating the dimensional tables for staging whilst obeying the conditions of a type 4 dimension mentioned at the start.
CREATE PROCEDURE Warehouse.usp_Populate_dim_MarvelHero
I have commented the code, but will go into some detail about each block:
- A temp table is created called #heros. It will be used to store all the relevant details of the staging records coming in and any details about the current record in the dimension, should it match.
- The first insert into the #hero table will be for records that match but that have changed. The unique column in staging and the dim is “Name”/“HeroName”. Therefore staging is joined to the dim based on this column and if either of the columns: FirstName or Surname have changed then a change must have occurred. Also note that if a match occurs to the dim, but the dim had previously been marked as deleted then a change is also written as it will need to be reinstated.
- The second insert, inserts rows that are only in staging. These are new rows that have never been seen before
- The final insert, inserts rows that no longer exist in the staging table. These are rows that will need to be soft deleted.
- The temp table is then used to merge into the dim. The ID of the dim is used to determine the type of match. If the ID matches then we know that this dim exists in “some” state. Therefore if it matches that values will be updated accordingly or it will be soft deleted. If it doesn’t match, then it’s consider new and inserted
- Finally any updates (other than soft deletes) are inserted into the history table. Note that if a soft delete is reinstated, then the record of the soft delete will be stored into the history table
- Correct the typo and then run the proc
- Delete the first row and run the proc again
SELECT * FROM Warehouse.dim_MarvelHero
We can now see that the old value for the typo is in the history table and that the first row in the main table is now marked has soft deleted.