3

In my previous question here : How to efficiently replace items between Dataframes in pandas?

I got a solution with map() function that works, but it overrides items that do no match.

In case I have 2 df

df = pd.DataFrame({'Ages':[20, 22, 57, 250], 'Label':[1,1,2,7]})
label_df = pd.DataFrame({'Label':[1,2,3], 'Description':['Young','Old','Very Old']})

I want to replace the label values in df to the description in label_df, but if there is no match between the indexes, keep the original value.

What I am getting with df['Label'] = df['Label'].map(label_df.set_index('Label')['Description'])

{'Ages':[20, 22, 57, 250], 'Label':['Young','Young','Old', nan]}

Wanted result:

{'Ages':[20, 22, 57, 250], 'Label':['Young','Young','Old', 7]}
Moshe
  • 461
  • 1
  • 5
  • 15
  • You want the Non-Exhaustive Mapping in JohnE's solution. If you need a non-exhaustive mapping that sets things to `NaN` like `'Missing'` see my solution there. – ALollz Jun 15 '21 at 14:40

1 Answers1

7

You can further use .fillna() with original column after .map() to reinstate the original values in case of no match, as follows:

df['Label'] = df['Label'].map(label_df.set_index('Label')['Description']).fillna(df['Label'])

Alternatively, you can also use .replace() which does not set non-match to NaN (retain non-match values), as follows:

df['Label'] = df['Label'].replace(dict(zip(label_df['Label'], label_df['Description'])))

Result:

print(df)

   Ages  Label
0    20  Young
1    22  Young
2    57    Old
3   250      7
SeaBean
  • 22,547
  • 3
  • 13
  • 25