1

Question:

How can I update a pandas dataframe by the rows of another dataframe, while adding new columns if the other dataframe has columns not appearing in the original dataframe?

Example:

I have two dataframes:

original_df = pd.DataFrame([{'name' : 'ruben','age' : 25},{'name' : 'henk', 'age' : 26},{'name' : 'gijs', 'age' : 20}],index=[0,1,2])

    name  age
0   ruben  25
1   henk   26
2   gijs   20

and

update_df = pd.DataFrame([{'name' : 'henk','age' : 27,'hobby' : 'football'},{'name' : 'gijs', 'age' : 21,'hobby' : 'music'}],index=[1,2])

   name  age     hobby
1  henk   27  football
2  gijs   21     music

I want to update original_df with the rows of update_df. If they all would contain the same rows, i could do this as following:

original_df.update(update_df)

However, this results in:

    name  age
0  ruben  25
1   henk  27
2   gijs  21

How can i make sure the new column is added as well, so to get this result:

    name  age hobby
0  ruben  25
1   henk  27  football
2   gijs  21  music

Thank you in advance

Edit for Answer: This question was marked as a duplicate (by linking a general question about merging 101), but don't follow the link. The correct answer is in the comments:

update_df.combine_first(original_df)
user3053216
  • 777
  • 1
  • 9
  • 24

1 Answers1

0

You can use a left-join: "Left join, also known as Left Outer Join, returns a dataframe containing all the rows of the left dataframe" (Source). There is also a good thread about it on Stack Overflow: Pandas Merging 101

original_df = pd.DataFrame([{'name' : 'ruben','age' : 25},{'name' : 'henk', 'age' : 26},{'name' : 'gijs', 'age' : 20}],index=[0,1,2])
update_df = pd.DataFrame([{'name' : 'henk','age' : 27,'hobby' : 'football'},{'name' : 'gijs', 'age' : 21,'hobby' : 'music'}],index=[1,2])

df_merged = original_df.merge(update_df, on = 'name', how = 'left')
Jonas
  • 1,749
  • 1
  • 10
  • 18