Replace Values in Dataframe using a Lookup Dataframe

Question

I want to replace values in the df dataframe using the lookup dataframe.

import pandas as pd

df=pd.DataFrame({
                  'no1':[20,20,40,10,50],
                  'no2':[50,20,10,40,50],
                  'no3':[30,10,50,40,50]
                  })

   no1  no2 no3
0   20  50  30
1   20  20  10
2   40  10  50
3   10  40  40
4   50  50  50

lookup=pd.DataFrame({'label':['A','B','C','D','E'],
                  'id':[10,20,30,40,50]})

    label id
0   A     10
1   B     20
2   C     30
3   D     40
4   E     50

Particularly, I'd like to have:

   no1  no2 no3
0   B   E   C
1   B   B   A
2   D   A   E
3   A   D   D
4   E   E   E

What is the best way doing it using pandas?

P.S.: I found a very similar question herein, but I do not quite follow as it is in R. A Python solution is appreciated.

score 7 · Accepted Answer · answered Nov 02 '18 at 12:47

You could use replace with a dictionary:

import pandas as pd

df=pd.DataFrame({
                  'no1':[20,20,40,10,50],
                  'no2':[50,20,10,40,50],
                  'no3':[30,10,50,40,50]
                  })

lookup=pd.DataFrame({'label':['A','B','C','D','E'],
                  'id':[10,20,30,40,50]})

result = df.replace(dict(zip(lookup.id, lookup.label)))

print(result)

Output

  no1 no2 no3
0   B   E   C
1   B   B   A
2   D   A   E
3   A   D   D
4   E   E   E

Thanks. Excellent solution. I wonder which one to mark as the accepted solution, all answers are great. You guys rock. — TwinPenguins, Nov 05 '18 at 10:30

jpp · Answer 2 · 2020-07-18T13:48:53.187

5

You can construct a dictionary and then use np.vectorize:

d = lookup.set_index('id')['label'].to_dict()  # or d = dict(np.fliplr(lookup.values))
df.iloc[:] = np.vectorize(d.get)(df.values)

print(df)

  no1 no2 no3
0   B   E   C
1   B   B   A
2   D   A   E
3   A   D   D
4   E   E   E

edited Jul 18 '20 at 13:48

answered Nov 02 '18 at 12:48

jpp

159,742
34
281
339

Nice one with numpy vectorize. Maybe use d = dict(np.fliplr(lookup.values)) to avoid slow set_index? – b2002 Nov 02 '18 at 18:03
Thanks. Excellent solution. I wonder which one to mark as the accepted solution, all answers are great. You guys rock. – TwinPenguins Nov 05 '18 at 10:30
To my taste and limited Python knowledge, Daniel Mesejo's solution seems more understandable. Yet it could be that for example yours to be more efficient! Thanks a lot anyway. – TwinPenguins Nov 05 '18 at 10:33
I wish multiple accepted answers were allowed here. Hopefully in future this feature will be available. Cheers – TwinPenguins Nov 05 '18 at 10:34
1

for a similar task on my moderately powerful laptup, I used `np.vectorize` on a medium sized df (50k rows, 10 columns) and a large lookup table (4 mio rows of name-id pairs), and it worked almost instantaneously. however, on a much larger df it broke: `Unable to allocate 17.8 TiB for an array with shape (3400599, 25) and data type – mluerig Jan 17 '21 at 15:47
1

This one worked like a charm for me. – Nesha25 Jan 20 '23 at 01:16

jezrael · Answer 3 · 2018-11-02T13:09:43.243

1

First create Series by set_index.

Use replace, but it should be slow in large DataFrame:

s = lookup.set_index('id')['label']
df = df.replace(s)

Solutions for None or NaNs for non matched values with applymap or apply with map:

df = df.applymap(s.get)

Or:

df = df.apply(lambda x: x.map(s))

Or:

for c in df.columns:
    df[c] = df[c].map(s)

print (df)
  no1 no2 no3
0   B   E   C
1   B   B   A
2   D   A   E
3   A   D   D
4   E   E   E

edited Nov 02 '18 at 13:09

answered Nov 02 '18 at 12:46

jezrael

822,522
95
1,334
1,252

1

Thanks. Excellent solutions. I wonder whose to mark as the accepted solution, all answers are great. You guys rock. – TwinPenguins Nov 05 '18 at 10:30

Replace Values in Dataframe using a Lookup Dataframe

3 Answers3