Merging dataframes on one column while replacing values with another column

Question

I have two dataframes:

mapping = pd.DataFrame({'idx': ['a','b','c'], 'val': [1, 2, 3]})
obs = pd.DataFrame({'obs': ['a','c','a','a','b','d']})

I would like for all observations in obs, that are present in mappings idx column, to get the value of mappings val column. If the value does not exist in mappings idx column, it should be discarded.

That is, I would like to end up with the following dataframe:

obs_mapped = pd.DataFrame({'obs': [1,3,1,1,2]})

Is there a handy way to do this with pandas?

`obs.merge(mapping.rename(columns={'idx': 'obs'}))[['val']]` — mozway, Aug 30 '23 at 08:19
@mozway - For me ouput is like `[1,1,1,3,2]`, for you working correct? — jezrael, Aug 30 '23 at 08:31

jezrael · Accepted Answer · 2023-08-30T08:26:24.303

Use Series.map with DataFrame.dropna:

out = (obs.assign(obs = obs['obs'].map(mapping.set_index('idx')['val']))
          .dropna(subset=['obs'])
          .astype(mapping.val.dtype))
print (out)
   obs
0    1
1    3
2    1
3    1
4    2

Or if use DataFrame.merge need multiple rename with sorting by original indices converted to column:

out = (obs.rename(columns={'obs':'idx'}).reset_index()
          .merge(mapping)
          .rename(columns={'val':'obs'})
          .sort_values('index', ignore_index=True)[['obs']])
print (out)
   obs
0    1
1    3
2    1
3    1
4    2

score 0 · Answer 2 · answered Aug 30 '23 at 08:41

This looks like a simple merge/map, however if you want to ensure having the values in order and mapping only the existing values, then pre-filter with isin before map:

out = (obs.loc[obs['obs'].isin(mapping['idx']), 'obs']
          .map(mapping.set_index('idx')['val'])
          .to_frame()
       )

Output:

Merging dataframes on one column while replacing values with another column

2 Answers2