Question:
How can I update a pandas dataframe by the rows of another dataframe, while adding new columns if the other dataframe has columns not appearing in the original dataframe?
Example:
I have two dataframes:
original_df = pd.DataFrame([{'name' : 'ruben','age' : 25},{'name' : 'henk', 'age' : 26},{'name' : 'gijs', 'age' : 20}],index=[0,1,2])
name age
0 ruben 25
1 henk 26
2 gijs 20
and
update_df = pd.DataFrame([{'name' : 'henk','age' : 27,'hobby' : 'football'},{'name' : 'gijs', 'age' : 21,'hobby' : 'music'}],index=[1,2])
name age hobby
1 henk 27 football
2 gijs 21 music
I want to update original_df with the rows of update_df. If they all would contain the same rows, i could do this as following:
original_df.update(update_df)
However, this results in:
name age
0 ruben 25
1 henk 27
2 gijs 21
How can i make sure the new column is added as well, so to get this result:
name age hobby
0 ruben 25
1 henk 27 football
2 gijs 21 music
Thank you in advance
Edit for Answer: This question was marked as a duplicate (by linking a general question about merging 101), but don't follow the link. The correct answer is in the comments:
update_df.combine_first(original_df)