I have two API queries that return approximately 400 columns of data each by about 10,0000 rows of data deep. The columns are of all sorts of different types and in different order. The data is not always the same. I have a mapping file that gives me a column-to-column mapping. There’s one id column in both data sets that’s supposed to be common to both data sets.
I need to merge the two data sets, align the mapped columns side by side, and add a difference.
Looking for some guidance on whether there's a built-in library that handles something like this?
Here’s a simplified version of the problem for illustration:
Df1
Id1 | Data1 | Data2 | Data3 | Data4 |
---|---|---|---|---|
A | 1 | 1 | 1 | 1 |
B | 2 | 2 | 2 | 2 |
C | 3 | 3 | 3 | 3 |
Df2
Id2 | Data4 | Data1 | Data2 | Data3 |
---|---|---|---|---|
A | 1 | 1 | 1 | 1 |
B | 2 | 2 | 2 | 2 |
C | 3 | 3 | 3 | 3 |
Mapping
Df2-data | Df1-data |
---|---|
Data1 | Data1 |
Data2 | Data2 |
Data3 | Data3 |
Data4 | Data4 |
Id2 | Id1 |
Df-out
Id1 | Id2 | Df1-Data1 | Df2-Data1 | Diff-Data1 | Df1-Data2 | Df2-Data2 | Diff-Data2 |
---|---|---|---|---|---|---|---|
A | A | 1 | 1 | 0 | 1 | 1 | 0 |
B | B | 2 | 2 | 0 | 2 | 2 | 0 |
C | C | 3 | 3 | 0 | 3 | 3 | 0 |