1

My example is made up. I'd like to figure it out with apply() and lambda, though I've tried iterrows() too with no luck. I am trying to add a column to df2, that looks up values in df1, based on the 'item' combinations in each df2 row. Thanks in advance for your help.

import pandas as pd
import numpy as np
import random

names= ['A', 'B', 'C', 'D', 'E']

df1 = pd.DataFrame( np.arange(25).reshape(5,5), columns = names, index = names)

n=5
data = {'Item 1' : random.sample(names, n),
        'Item 2' : random.sample(names, n)}
df2 = pd.DataFrame(data)

#I can't get this to work. 
df2['New'] = df2.apply(lambda x: df1.loc[df2.loc[x, 'Item 1'], df2.loc[x, 'Item 2']], axis=1)

#Since this works, I assume my error with apply and lambda.  Thanks.
x=2
df1.loc[df2.loc[x, 'Item 1'], df2.loc[x, 'Item 2']]

enter image description here

enter image description here

Glenn
  • 131
  • 1
  • 11

2 Answers2

2

I would avoid using apply in general, and specifically having a loc call inside a lambda function. This will get very slow with time.

Use numpy's vectorization instead:

r = df2['Item 1'].map(dict(zip(df1.index, np.arange(len(df1.index)))))
c = df2['Item 2'].map(dict(zip(df1.columns, np.arange(len(df1.columns)))))

df2['new'] = df1.to_numpy()[r, c]
rafaelc
  • 57,686
  • 15
  • 58
  • 82
  • Thank you for this. It looks good and helps me stretch my skills. I tried it but must be using an older version as I get an error, 'DataFrame' object has no attribute 'to_numpy' It's for a class, so I'm stuck with the version. – Glenn Feb 10 '21 at 23:57
  • @Glenn use `.values` instead of `to_numpy()` then. – rafaelc Feb 10 '21 at 23:59
  • That works. I get the right answer, identical to the above solution using `apply` and `loc` Thank you again for the follow up. Truthfully, I am currently more comfortable with `apply` and `loc` but I fully understand your point with respect to efficiency. I need to study the line more to fully adopt this thinking. – Glenn Feb 11 '21 at 00:46
  • 1
    @Glenn Yes, it's the right answer. But mine is vectorize, which is way faster (for larger datasets). That's the key difference :p – rafaelc Feb 11 '21 at 00:47
  • @rafaelc I haven`t seen you around for a while, nice to see this vectorize solution ;) – Ben.T Mar 11 '21 at 02:39
1
df2['new'] = df2.apply(lambda x: df1.loc[x['Item 1'],x['Item 2']], axis=1)

output:

>>> df2
  Item 1 Item 2  new
0      D      A   15
1      B      B    6
2      A      D    3
3      E      C   22
4      C      E   14

Is that what you want? If its not, please add a sample output you want to see.

Behzad Shayegh
  • 323
  • 1
  • 10
  • Yes. This is good. Thank you very much. I moved the brackets a few times, but didn't see that. What gets bracketed can be confusing. Outside of lambda, I only used df2.loc[x, 'Item 1'] – Glenn Feb 10 '21 at 21:45