Pandas replacing values in a column by values in another column

Question

Let's say I have the following dataframe X (ppid is unique):

    ppid  col2 ...
1   'id1'  '1'
2   'id2'  '2'
3   'id3'  '3'
...

I have another dataframe which serves as a mapping. ppid is same as above and unique, however it might not contain all X's ppids:

    ppid  val
1   'id1' '5'
2   'id2' '6'

I would like to use the mapping dataframe to switch col2 in dataframe X according to where the ppids are equal (in reality, they're multiple columns which are unique together), to get:

    ppid  col2 ...
1   'id1'  '5'
2   'id2'  '6'
3   'id3'  '3' # didn't change, as there's no match
...

Merge in the second DataFrame on `ppid` with `how='left'`, and use `fillna` to fill in the gaps — trianta2, Jun 15 '21 at 16:07

score 2 · Answer 1 · answered Jun 15 '21 at 16:08

2

Try using map with set_index:

df_x = pd.DataFrame({'ppid':['id1','id2','id3'], 'col2':[*'123']})

df_a = pd.DataFrame({'ppid':['id1','id2'], 'val':[*'56']})

df_x['col2'] = df_x['ppid'].map(df_a.set_index('ppid')['val']).fillna(df_x['col2'])

Output:

  ppid col2
0  id1    5
1  id2    6
2  id3    3

answered Jun 15 '21 at 16:08

Scott Boston

147,308
15
139
187

AttributeError: 'DataFrame' object has no attribute 'map' – Jjang Jun 16 '21 at 06:48
`df_x['ppid']` is not a DataFrame, it's a Series. This code works on your sample. – Corralien Jun 16 '21 at 07:19
1

df_x has more columns in reality, which were omitted for brevity... is it a small fix? – Jjang Jun 16 '21 at 08:00
This doesn't matter. We are only createing a new column in df_x called 'Col2' All the there colums in df_x stay the same and are not affected. – Scott Boston Jun 16 '21 at 12:58

Corralien · Answer 2 · 2021-06-16T07:14:28.783

2

Input data:

>>> dfX
    ppid col1 col2 col3
0  'id1'  'X'  '5'  'A'
1  'id2'  'Y'  '6'  'B'
2  'id3'  'Z'  '3'  'C'

>>> dfM
    ppid  val
0  'id1'  '5'
1  'id2'  '6'

dfX is your first dataframe and dfM is your mapping dataframe:

>>> dfM.rename(columns={'val': 'col2'}).combine_first(dfX).loc[:, df.columns]

    ppid col1 col2 col3
0  'id1'  'X'  '5'  'A'
1  'id2'  'Y'  '6'  'B'
2  'id3'  'Z'  '3'  'C'

edited Jun 16 '21 at 07:14

answered Jun 15 '21 at 16:19

Corralien

109,409
8
28
52

I updated my answer. I think you have a problem of column arrangement, have you? – Corralien Jun 16 '21 at 07:15

score 2 · Accepted Answer · answered Jun 15 '21 at 16:25

2

Have a look at Jeremy Z answer on this post, for further explanation on solution https://stackoverflow.com/a/55631906/16235276

df1 = df1.set_index('ppid')
df2 = df2.set_index('ppid')
df1.update(df2)
df1.reset_index(inplace=True)

answered Jun 15 '21 at 16:25

takimas

36
2

score 0 · Answer 4 · answered Jun 15 '21 at 16:09

0

First merge your dataframes then use pd.Series.combine_first

df1 = pd.merge(df1, df2, how='left', on='ppid')
df1['col2'] = df1.val.combine_first(df1.col2)
del df1['val']

answered Jun 15 '21 at 16:09

tomtomfox

284
1
7

score 0 · Answer 5 · answered Jun 15 '21 at 16:11

df1 = pd.DataFrame({'ppid': ['id1', 'id2', 'id3'], 'col2': ['1', '2', '3']})
df2 = pd.DataFrame({'ppid': ['id1', 'id2'], 'col2': ['5', '6']})

merged = df1.merge(df2, how='left', on='ppid')
merged['col2_y'].fillna(merged['col2_x'], inplace=True)

merged

  ppid col2_x col2_y
0  id1      1      5
1  id2      2      6
2  id3      3      3

Pandas replacing values in a column by values in another column

5 Answers5