1

Let's say I have the following dataframe X (ppid is unique):

    ppid  col2 ...
1   'id1'  '1'
2   'id2'  '2'
3   'id3'  '3'
...

I have another dataframe which serves as a mapping. ppid is same as above and unique, however it might not contain all X's ppids:

    ppid  val
1   'id1' '5'
2   'id2' '6'

I would like to use the mapping dataframe to switch col2 in dataframe X according to where the ppids are equal (in reality, they're multiple columns which are unique together), to get:

    ppid  col2 ...
1   'id1'  '5'
2   'id2'  '6'
3   'id3'  '3' # didn't change, as there's no match
...
Jjang
  • 11,250
  • 11
  • 51
  • 87

5 Answers5

2

Try using map with set_index:

df_x = pd.DataFrame({'ppid':['id1','id2','id3'], 'col2':[*'123']})

df_a = pd.DataFrame({'ppid':['id1','id2'], 'val':[*'56']})

df_x['col2'] = df_x['ppid'].map(df_a.set_index('ppid')['val']).fillna(df_x['col2'])

Output:

  ppid col2
0  id1    5
1  id2    6
2  id3    3
Scott Boston
  • 147,308
  • 15
  • 139
  • 187
2

Input data:

>>> dfX
    ppid col1 col2 col3
0  'id1'  'X'  '5'  'A'
1  'id2'  'Y'  '6'  'B'
2  'id3'  'Z'  '3'  'C'

>>> dfM
    ppid  val
0  'id1'  '5'
1  'id2'  '6'

dfX is your first dataframe and dfM is your mapping dataframe:

>>> dfM.rename(columns={'val': 'col2'}).combine_first(dfX).loc[:, df.columns]

    ppid col1 col2 col3
0  'id1'  'X'  '5'  'A'
1  'id2'  'Y'  '6'  'B'
2  'id3'  'Z'  '3'  'C'
Corralien
  • 109,409
  • 8
  • 28
  • 52
2

Have a look at Jeremy Z answer on this post, for further explanation on solution https://stackoverflow.com/a/55631906/16235276

df1 = df1.set_index('ppid')
df2 = df2.set_index('ppid')
df1.update(df2)
df1.reset_index(inplace=True)
takimas
  • 36
  • 2
0

First merge your dataframes then use pd.Series.combine_first

df1 = pd.merge(df1, df2, how='left', on='ppid')
df1['col2'] = df1.val.combine_first(df1.col2)
del df1['val']
tomtomfox
  • 284
  • 1
  • 7
0
df1 = pd.DataFrame({'ppid': ['id1', 'id2', 'id3'], 'col2': ['1', '2', '3']})
df2 = pd.DataFrame({'ppid': ['id1', 'id2'], 'col2': ['5', '6']})

merged = df1.merge(df2, how='left', on='ppid')
merged['col2_y'].fillna(merged['col2_x'], inplace=True)

merged
  ppid col2_x col2_y
0  id1      1      5
1  id2      2      6
2  id3      3      3
trianta2
  • 3,952
  • 5
  • 36
  • 52