-1

I'm extracting data from a website. There are 100+ data columns for each record. Each record can have a different amount of data columns depending on the data available but all the different columns have a unique name thus is possible to match them but they won't be in the same order as the number of columns varies. The total number of columns is unknown.

Suppose that each record has 4 data tables, all with the same index.

How I can merge records into the first data table while placing each data point into its column and adding new columns if needed.

Example:

Table 1:

| id | A | B | C   |

| 0  | 1 | 2 | 5   |

| 1  | 2 | 3 | nan |

Table 2:

| id | C | D |

| 1  | 5 | 6 |

| 2  | 7 | 8 |

Result:

| Id    | A     | B     | C     | D     |

| 0     | 1     | 2     | 5     | 6     |

| 1     | 2     | 3     | nan   | nan   |

| 2     | nan   | nan   | 7     | 8     |

Notes: - nan's are blank values - merging will always be blank + value - in this case C was merged and D was added. I could have 10's of columns that would need to be merged. - I'm not asking for code, just guidance on how to accomplish this in python

Tried using pandas join and merge but they are not designed for this use case.

EliadL
  • 6,230
  • 2
  • 26
  • 43

1 Answers1

0

Using pd.DataFrame.merge:

import pandas as pd

df1 = pd.DataFrame({'A': [1, 2], 'B': [2, 3], 'C': [5, None]})
df2 = pd.DataFrame({'C': [5, 7], 'D': [6, 8]})

common_columns = list(df1.columns & df2.columns)

df1.merge(df2, on=common_columns, how='outer')

Output:

     A    B    C    D
0  1.0  2.0  5.0  6.0
1  2.0  3.0  NaN  NaN
2  NaN  NaN  7.0  8.0
EliadL
  • 6,230
  • 2
  • 26
  • 43