1

I have two dataframes:

mapping = pd.DataFrame({'idx': ['a','b','c'], 'val': [1, 2, 3]})
obs = pd.DataFrame({'obs': ['a','c','a','a','b','d']})

I would like for all observations in obs, that are present in mappings idx column, to get the value of mappings val column. If the value does not exist in mappings idx column, it should be discarded.

That is, I would like to end up with the following dataframe:

obs_mapped = pd.DataFrame({'obs': [1,3,1,1,2]})

Is there a handy way to do this with pandas?

jonrsharpe
  • 115,751
  • 26
  • 228
  • 437
TylerD
  • 361
  • 1
  • 9

2 Answers2

2

Use Series.map with DataFrame.dropna:

out = (obs.assign(obs = obs['obs'].map(mapping.set_index('idx')['val']))
          .dropna(subset=['obs'])
          .astype(mapping.val.dtype))
print (out)
   obs
0    1
1    3
2    1
3    1
4    2

Or if use DataFrame.merge need multiple rename with sorting by original indices converted to column:

out = (obs.rename(columns={'obs':'idx'}).reset_index()
          .merge(mapping)
          .rename(columns={'val':'obs'})
          .sort_values('index', ignore_index=True)[['obs']])
print (out)
   obs
0    1
1    3
2    1
3    1
4    2
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
0

This looks like a simple merge/map, however if you want to ensure having the values in order and mapping only the existing values, then pre-filter with isin before map:

out = (obs.loc[obs['obs'].isin(mapping['idx']), 'obs']
          .map(mapping.set_index('idx')['val'])
          .to_frame()
       )

Output:

   obs
0    1
1    3
2    1
3    1
4    2
mozway
  • 194,879
  • 13
  • 39
  • 75