1

I want python to perform updating of values next to a value found in both dataframes (somewhat similar to VLOOKUP in MS Excel). So, for

import pandas as pd
df1 = pd.DataFrame(data = {'col1':['a', 'b', 'd'], 'col2': [1, 2, 4], 'col3': [2, 3, 4]})
df2 = pd.DataFrame(data = {'col1':['a', 'f', 'c', 'd']})
In [3]: df1
Out[3]: 
        col1    col2    col3
      0  a        1       2
      1  b        2       3
      2  d        4       4

In [4]: df2
Out[4]: 
        col1    
      0  a        
      1  f        
      2  c       
      3  d

Outcome must be the following:

In [6]: df3 = *somecode*
        df3
Out[6]: 
        col1    col2    col3
      0  a        1       2
      1  f                
      2  c                
      3  d        4       4

The main part is that I want some sort of "for loop" to do this.

So, for instance python searches for first value in col1 in df2, finds it in df1, and updates col2 and col3 respectivly, then moves forward.

reebyt
  • 13
  • 5

2 Answers2

0

First for loop in pandas is best avoid if some vectorized solution exist.

I think merge with left join is necessary, parameter on should be omit if only col1 is same in both DataFrames:

df3 = df2.merge(df1, how='left')
print (df3)
  col1  col2  col3
0    a   1.0   2.0
1    f   NaN   NaN
2    c   NaN   NaN
3    d   4.0   4.0
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
  • Thanks a lot, but why is it best to avoid, and where can I read something about it? I have not found something meaningful on the internet about it. – reebyt Jun 01 '18 at 07:02
  • @reebyt - loops are slow, it is main reason. Check [this](https://stackoverflow.com/questions/24870953/does-iterrows-have-performance-issues/24871316#24871316) - `Jeff` is now one of pandas developr now. – jezrael Jun 01 '18 at 07:04
0

try this,

Simple left join will solve your problem,

pd.merge(df2,df1,how='left',on=['col1'])

  col1  col2  col3
0    a   1.0   2.0
1    f   NaN   NaN
2    c   NaN   NaN
3    d   4.0   4.0
Mohamed Thasin ah
  • 10,754
  • 11
  • 52
  • 111