2

I am keen find an efficient design solution, that Ralph Kimball's model propose to handle historical load of SCD Type 2 dimension, involving multiple source tables, without using a PIT Table.

The source data is comprises of many CDC enabled tables, with several intra day changes. The target Data Warehouse is expected to have a daily grain.

In past i have used Point-In-Time (PIT) tables, that builds a end-of-day snapshot using the business keys of source tables involved for dimension build. This greatly optimizes Dimension load capabilities when source tables involved are both wide and deep. My organization prefers to use Kimball's approach, and as PIT is a DV concepts, and hence it is off-table.

Please share alternative approaches. Some generic examples or any useful tips that you might have used will be great to hear.


Absolutely I understand PIT works well to load historical data for initial load (unfortunately it seems Kimball hasn't mentioned it and hence solution designer is not keen to allow using the PITs to load )

for example, i have to create a Member dimension having an attribute Member_Status which is derived based on the all the account the member holds at a given point in time.

To give a very simple example say if : on a given date D1 a member has 1 open and 1 closed account member status is: active next day D2, he has 2 closed account , member status is: exited next day D3, he opens another account, member status is: active again

Now the whole thing has to be loaded retrospectively (eg. last 10 year of data as SCD2) Where i can see 2 option

  1. run the load in loop for every day from the start date ( Not an efficient solution )
  2. Create a PIT ( Not allowed :( )

Is there any other option ?


"The source data is comprises of many CDC enabled tables, with several intra day changes. The target Data Warehouse is expected to have a daily grain. " just to explain this a bit more

say there is a member (entity) and account (entity) Member:Account relationship is 1:Many

During the day a member can have several updates ( like correction of Name, DOB, Address etc) Same true for the Accounts CDC will capture all of those changes, while for the target DWH only the end of day snapshot need to be captured. Well to make it happen is not a big roadblock, my main question revolves around initial load and efficient strategy to handle it.

DataGuy
  • 21
  • 2
  • The source data is comprises of many CDC enabled tables, with several intra day changes. The target Data Warehouse is expected to have a daily grain. Can you please explain more ? with a sample data maybe ? – Amira Bedhiafi Feb 28 '21 at 21:16
  • I'm not sure I understand the question. You want to build a SCD2 dimension from many sources without using PIT tables. PIT is not just a concept owned by DV. If it works for you, use it. It's strange to exclude a design approach because it seems to belong in a different camp. In fact if all of your dimensions change every day and you are using SCD2.... your dimensional table is effectively a daily snapshot table anyway. – Nick.Mc Feb 28 '21 at 23:04

0 Answers0