For this example i have 2 dataframes, the genre column in df1 is column 3 but in df2 it is column 2, also the header is slightly different. in my actual script i have to search the column names because the column location varies in each sheet it reads.
how do i recognise different header names as the same thing?
df1 = pd.DataFrame({'TITLE': ['The Matrix','Die Hard','Kill Bill'],
'VENDOR ID': ['1234','4321','4132'],
'GENRE(S)': ['Action', 'Adventure', 'Drama']})
df2 = pd.DataFrame({'TITLE': ['Toy Story','Shrek','Frozen'],
'Genre': ['Animation', 'Adventure', 'Family'],
'VENDOR ID': ['5678','8765','8576']})
column_names = ['TITLE','VENDOR ID','GENRE(S)']
appended_data = []
sheet1 = df1[df1.columns.intersection(column_names)]
appended_data.append(sheet1)
sheet2 = df2[df2.columns.intersection(column_names)]
appended_data.append(sheet2)
appended_data = pd.concat(appended_data, sort=False)
output:
TITLE VENDOR ID GENRE(S)
0 The Matrix 1234 Action
1 Die Hard 4321 Adventure
2 Kill Bill 4132 Drama
0 Toy Story 5678 NaN
1 Shrek 8765 NaN
2 Frozen 8576 NaN
desired output:
TITLE VENDOR ID GENRE(S)
0 The Matrix 1234 Action
1 Die Hard 4321 Adventure
2 Kill Bill 4132 Drama
0 Toy Story 5678 Animation
1 Shrek 8765 Adventure
2 Frozen 8576 Family