Setting value of columns in one dataframe to another dataframe column based on condition

Question

I have a dataframe df1 :-

id	Group	Family	Bonus
1	tri23_1	Laavin	A
2	hsgç_T2	Grendy	B
3	bbbj-1Y_jn	Fantol	C
4	hsgç_T2	Gondow	D

and have a dataframe df2 :-

id	Group	Family	Bonus
1	tri	Brendy	B
2	hsgç_T2	Fantol	A
5	bbbj-1Y_jn	Fantol	E
6	hsgç_T3	Gondow	D

For id in df1 that are present in df2,I want to set Group ,Family,Bonus columns alone of df2 in df1. Expected df1:-

id	Group	Family	Bonus
1	tri	Brendy	B
2	hsgç_T2	Fantol	A
3	bbbj-1Y_jn	Fantol	C
4	hsgç_T2	Gondow	D

Does this answer your question? [Python pandas - particular merge/replacement](https://stackoverflow.com/questions/25762902/python-pandas-particular-merge-replacement) — Vladimir Fokow, Aug 23 '22 at 16:44
@VladimirFokowno it does not. the `update` method updates all columns, but I specifically want only a few columns be updated — Scope, Aug 23 '22 at 16:50

Vladimir Fokow · Accepted Answer · 2022-08-28T16:32:19.583

shared_columns = ['Group', 'Family', 'Bonus']

One-liner:

df1.update(df1.set_index('id')[shared_columns]
              .align(df2.set_index('id'), join='left')[1]
              .set_axis(df1.index, axis=0))

My original solution:

replacements = (pd.merge(df1, df2, on='id', how='left')
                  .iloc[:, -len(shared_columns):])
replacements.columns = shared_columns

df1.update(replacements)

Example:

df1 = pd.DataFrame([[1,  1, 0.1, 1], 
                    [2,  2, 0.2, 2], 
                    [3,  3, 0.3, 3]], 
                   columns=['id', 'Group', 'Family', 'Bonus'])
df2 = pd.DataFrame([[3,  30, 300, 3], 
                    [1,  10, 100, 1], 
                    [4,  40, 400, 4]], 
                   columns=['id', 'Group', 'Family', 'Bonus'])

df1:

   id  Group  Family  Bonus
0   1      1     0.1      1
1   2      2     0.2      2
2   3      3     0.3      3

df2:

   id  Group  Family  Bonus
0   3     30     300      3
1   1     10     100      1
2   4     40     400      4

pd.merge(df1, df2, on='id', how='left' performs a left-merge, on the column id:

   id  Group_x  Family_x  Bonus_x  Group_y  Family_y  Bonus_y
0   1        1       0.1        1     10.0     100.0      1.0
1   2        2       0.2        2      NaN       NaN      NaN
2   3        3       0.3        3     30.0     300.0      3.0

replacements (after selecting the last columns, and renaming them):

   Group  Family  Bonus
0   10.0   100.0    1.0
1    NaN     NaN    NaN
2   30.0   300.0    3.0

Resulting df1:

   id  Group  Family  Bonus
0   1   10.0   100.0    1.0
1   2    2.0     0.2    2.0
2   3   30.0   300.0    3.0

Another way to select the right columns after merging:

By default, .merge uses suffixes=('_x', '_y'). So to select the columns of the right data frame, we can append '_y' to each its column name:

merged = pd.merge(df1, df2, on='id', how='left')
right_colnames = [colname + '_y' for colname in shared_columns]
replacements = merged[right_colnames]

   Group_y  Family_y  Bonus_y
0     10.0     100.0      1.0
1      NaN       NaN      NaN
2     30.0     300.0      3.0

Another solution:

temp_df = df1.reset_index().set_index('id')
temp_df.update(df2.set_index('id')[shared_columns])
df1 = temp_df.reset_index().set_index('index').rename_axis(None, axis=0)

what is I do not have these columns in the end? how can i modify `-len` — Scope, Aug 24 '22 at 07:18
Just found out about the [`.align()`](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.align.html) method - it can be used instead or mergind and then selecting — Vladimir Fokow, Aug 26 '22 at 17:20
where does it actually subset the columns/variables for align? — Scope, Aug 28 '22 at 13:15

score 0 · Answer 2 · answered Aug 23 '22 at 18:16

if setting index on DF is acceptable, then this is one way to accomplish it. Using update

df.set_index('id', inplace=True)
df.update(df2.set_index('id'))
df

    Group       Family  Bonus
id          
1   tri         Brendy  B
2   hsgç_T2     Fantol  A
3   bbbj-1Y_jn  Fantol  C
4   hsgç_T2     Gondow  D

Setting value of columns in one dataframe to another dataframe column based on condition

2 Answers2