How to efficiently do lookup in dataframe based on multiple columns

Question

I have the follow two dataframes, and I need calculate value column in the df2 based on df1

df1

col1    col2 col3 value

Chicago M     26   54

NY      M     20   21
...

df2

col1 col2 col3 value

NY     M    20   ? (should be 21 based on above dataframe)

I am doing loop like below which is slow

for index, row in df2.iterrows():
    df1[(df1['col1'] == row['col1']) 
                     & (df1['col2'] == df1['col2'])
                    &(df1['col3'] == df1['col3'])]['value'].values[0]

how to do it more efficiently/fast?

score 0 · Answer 1 · answered Aug 09 '17 at 07:51

0

You need merge with left join by columns for compare first:

print (df2)
  col1 col2  col3 value
0   LA    M    20    20
1   NY    M    20     ?

df = pd.merge(df2, df1, on=['col1','col2','col3'], how='left', suffixes=('','_'))

It create new column value_1 with matched values. Last use fillna for replace NaNs by original values and last remove helper column value_:

print (df)
  col1 col2  col3 value  value_
0   LA    M    20    20     NaN
1   NY    M    20     ?    21.0

df['value'] = df['value_'].fillna(df['value'])
df = df.drop('value_', axis=1)
print (df)
  col1 col2  col3 value
0   LA    M    20    20
1   NY    M    20    21

answered Aug 09 '17 at 07:51

jezrael

822,522
95
1,334
1,252

thanks. I was working on a case where '>' instead of '==' is needed, and totally forget I can do merge in this case – user843809 Aug 09 '17 at 17:02
Hmmm, it is totally different question and more compliacted. Can you update data? And you working with multiple columns - so if one column value is higher and next 2 are lower in one row - then what is output? – jezrael Aug 09 '17 at 17:11

How to efficiently do lookup in dataframe based on multiple columns

1 Answers1