I am looking to validate data across 2 tables, where the TableA (in the users source database - DB2) is a copy of the TableB (in the Data warehouse - DB2/Netezza).
Due to the process of copying the data across not being stable and being run on a daily basis instead of in real-time, there are discrepancies in the target DB, which I need to figure out and find. Also, do not wish to put a big load on the server, which might disrupt normal operations of the organization.
I am doing the following currently:
a) SUM checks on all numeric fields
b) MAX / MIN checks
c) Row Count checks (COUNT(*))
d) Checking the Data Types on both the DB's to be equal
e) Checking the DISTINCT count of the PRIMARY fields
f) Check MAX of Date on which the row was copied over (we put a new column to specify when that row was copied over)
Aside from the above, is there any other simpler way of doing it in a single SQL Query, which does not use up too many resources on the Database Servers?
Also, does anyone know of a way to match a row to row in DB2-DB2 and DB2-Netezza tables, like one can do with a SAS table?
Note:
a) Netezza does not have the concept of UNIQUE or PRIMARY, so Referential Integrity and UNIQUE checking is not valid.
b) All the tables have > 100M rows, so running a simple RowCount on each table can sometimes take upto 30-45 mins