In a data warehouse project how do I verify that my fact table loaded in a data warehouse DB through SSIS ETL load is correct with my staging table so that later I don't have incorrect reporting?
Asked
Active
Viewed 202 times
1 Answers
2
Good question, people creates different systems for this. So you understand this is one of most complex check/reconciliation process that developers built. I tried to give you three ways to do this. I would recommend first one because its easier and most efficient.
You can -
- Post Load reports: create reports which will reconcile data after load. Write SQL to compare source data and target data - compare count, compare amount, compare null values, compare daily data etc. If the comparison generates flag/alert - this means some issue in load.
- Check as you go : You can create some reusable function or mapping which will compare incoming source data and target data - compare count, compare amount, compare null values, compare daily data etc. and store in a table. A script will keep on checking those values and if there is any issue, script will notify support team.
- Pre process check : Before starting any ETL, you can check source data - like count, null values, daily count etc. to verify how is the data, if there is any file missing etc.

Koushik Roy
- 6,868
- 2
- 12
- 33
-
I have made the fact table where I have added all the surrogate key of dimensions tables (note business key are still in dimension tables) and joined these dimensions with the fact based on the surrogate key. I have included all the fact qualifying columns in fact table including invoice number of a dimension from staging tables can you please elaborate how to check if my design is correct and giving proper result. Also any tip while building fact table should keep in mind which column should include/exclude......thanks – Ramesh Nov 05 '20 at 08:51
-
How to check design is correct and giving proper result - your reports/business should conduct testing to validate proper data. While building fact, most imp are granularity of data and measures, related dimensions. And then comes index (mostly bitmap) on surrogate keys. – Koushik Roy Jan 20 '21 at 18:09