Source | Target |
---|---|
![]() |
![]() |
Two tables will be joined via composite keys ID1
and ID2
via pandas
merge
. We have a data testing tool that does the data analysis. It will perform the merge
and filter whatever rows are not present on the left or right side of the merge
into an external Missing In Source
or Missing in Target
table.
Missing in Target | Missing in Source |
---|---|
![]() |
![]() |
In this example in Source
the second row with composite key ABC,345
is missing in Target
. So that row will be filtered into Missing in Target
. Similarly in Target
the second row with composite key ABC,222
is missing in Source
so it will be filtered into Missing in Source
.
For the Missing ...
tables the business wants to know "why exactly are they missing"--which composite key by row made the row missing? For example, for the row with ABC,345
in the Missing in Target
above, ABC
was present in both table rows but 345
wasn't. Therefore,ID2
with value 345
is the guilty key for this row.
I should also mention that the version of pandas we are using is 0.23 unfortunately.