shared_columns = ['Group', 'Family', 'Bonus']
One-liner:
df1.update(df1.set_index('id')[shared_columns]
.align(df2.set_index('id'), join='left')[1]
.set_axis(df1.index, axis=0))
My original solution:
replacements = (pd.merge(df1, df2, on='id', how='left')
.iloc[:, -len(shared_columns):])
replacements.columns = shared_columns
df1.update(replacements)
Example:
df1 = pd.DataFrame([[1, 1, 0.1, 1],
[2, 2, 0.2, 2],
[3, 3, 0.3, 3]],
columns=['id', 'Group', 'Family', 'Bonus'])
df2 = pd.DataFrame([[3, 30, 300, 3],
[1, 10, 100, 1],
[4, 40, 400, 4]],
columns=['id', 'Group', 'Family', 'Bonus'])
df1
:
id Group Family Bonus
0 1 1 0.1 1
1 2 2 0.2 2
2 3 3 0.3 3
df2
:
id Group Family Bonus
0 3 30 300 3
1 1 10 100 1
2 4 40 400 4
pd.merge(df1, df2, on='id', how='left'
performs a left-merge, on the column id
:
id Group_x Family_x Bonus_x Group_y Family_y Bonus_y
0 1 1 0.1 1 10.0 100.0 1.0
1 2 2 0.2 2 NaN NaN NaN
2 3 3 0.3 3 30.0 300.0 3.0
replacements
(after selecting the last columns, and renaming them):
Group Family Bonus
0 10.0 100.0 1.0
1 NaN NaN NaN
2 30.0 300.0 3.0
Resulting df1
:
id Group Family Bonus
0 1 10.0 100.0 1.0
1 2 2.0 0.2 2.0
2 3 30.0 300.0 3.0
Another way to select the right columns after merging:
By default, .merge uses suffixes=('_x', '_y')
. So to select the columns of the right data frame, we can append '_y'
to each its column name:
merged = pd.merge(df1, df2, on='id', how='left')
right_colnames = [colname + '_y' for colname in shared_columns]
replacements = merged[right_colnames]
Group_y Family_y Bonus_y
0 10.0 100.0 1.0
1 NaN NaN NaN
2 30.0 300.0 3.0
Another solution:
temp_df = df1.reset_index().set_index('id')
temp_df.update(df2.set_index('id')[shared_columns])
df1 = temp_df.reset_index().set_index('index').rename_axis(None, axis=0)