2

I have two dataframes like this:

df1 = pd.DataFrame({'ID1':['A','B','C','D','E','F'],
                    'ID2':['0','10','80','0','0','0']})
df2 = pd.DataFrame({'ID1':['A','D','E','F'],
                    'ID2':['50','30','90','50'],
                    'aa':['1','2','3','4']})

df1,df2

I want to insert ID2 in df2 into ID2 in df1, and at the same time insert aa into df1 according to ID1 to obtain a new dataframe like this:

df_result = pd.DataFrame({'ID1':['A','B','C','D','E','F'],
                       'ID2':['50','10','80','30','90','50'],
                         'aa':['1','NaN','NaN','2','3','4']})

df_result

I've tried to use merge, but it didn't work.

Jiao
  • 219
  • 1
  • 10

3 Answers3

3

You can use combine_first on the DataFrame after setting the index to ID1:

(df2.set_index('ID1')  # values of df2 have priority in case of overlap
    .combine_first(df1.set_index('ID1')) # add missing values from df1
    .reset_index()     # reset ID1 as column
)

output:

  ID1 ID2   aa
0   A  50    1
1   B  10  NaN
2   C  80  NaN
3   D  30    2
4   E  90    3
5   F  50    4
mozway
  • 194,879
  • 13
  • 39
  • 75
1

Use df.merge with Series.combine_first:

In [568]: x = df1.merge(df2, on='ID1', how='left')
In [571]: x['ID2'] = x.ID2_y.combine_first(x.ID2_x)
In [574]: x.drop(['ID2_x', 'ID2_y'], 1, inplace=True)

In [575]: x
Out[575]: 
  ID1   aa ID2
0   A    1  50
1   B  NaN  10
2   C  NaN  80
3   D    2  30
4   E    3  90
5   F    4  50

OR use df.filter with df.ffill:

In [568]: x = df1.merge(df2, on='ID1', how='left')
In [597]: x['ID2'] = x.filter(like='ID2').ffill(axis=1)['ID2_y']
In [599]: x.drop(['ID2_x', 'ID2_y'], 1, inplace=True)
Mayank Porwal
  • 33,470
  • 8
  • 37
  • 58
  • You don't need the `merge`, [`combine_first` is sufficient if you use the proper index](https://stackoverflow.com/a/70958995/16343464) ;) – mozway Feb 02 '22 at 16:15
1

Try this:

new_df = df1.assign(ID2=df1['ID2'].replace('0', np.nan)).merge(df2, on='ID1', how='left').pipe(lambda g: g.assign(ID2=g.filter(like='ID2').bfill(axis=1).iloc[:, 0]).drop(['ID2_x', 'ID2_y'], axis=1))

Output:

>>> new_df
  ID1   aa ID2
0   A    1  50
1   B  NaN  10
2   C  NaN  80
3   D    2  30
4   E    3  90
5   F    4  50