2

I'm trying to use Vlookup in Python from two excel files. One file is an original file and the other one is run through some logics. In the original file, column 3 is empty and the other file is filled with data in column 3.

Here is the original file:

 #Original file
 col1 | col2 | col3
   a     b      
   c     d
   e     f
   g     h
   i     j 
   k     l
   m     n 
   ...

Here is the second file:

 #Second file
 col1 | col2 | col3
   a     b      1
   c     d      2
   e     f      3
   g     h      4
   i     j      3
   k     l      1
   m     n      2
   ...

What I want to do is to merge these files into one by using 'VLOOKUP' function. Here is my code:

import pandas as pd
import openpyxl

df1 = pd.read_excel("original.xlsx")
df2 = pd.read_excel("second.xlsx")

#Merge
result = pd.merge(df1, df2, on = ['col1'], how='left')
result_columns = result.columns.str.replace('_x|_y','', regex=True)
result.to_excel("result.xlsx", index=False)

However, when I run through the above code, the file with results shows only the merged data and not the filled data in column 3. Also, there are duplicated columns and data generated with few blanks at the end of original columns in the result file.

Here is my result file:

col1 | col2 | col3 | col2 | col3
  a     b             b     1      
  c     d             d     2
  e     f             f     
  g     h             h     4
  i     j             j     
  k     l             l     1
  m     n             n     2
  ...
  

My desired output is if second file col1 would match with col1 in the original file, fill the 'col3' in original file

For example:

col1 | col2 | col3
  a      b     1
  c      d     2
  e      f     3
  g      h     4
  i      j     3
  k      l     1
  m      n     2
  ...

Anyone has any idea?

  • 1
    in df2, when col1 == 'a', col3 == 1. So why do you have 4 in col3 when col1 == a in your desired output? In fact, I think all of them are incorrect except when col1 == e? – sophocles Apr 02 '21 at 10:55
  • 1
    You seem to have a mistake in last example, it doesn't fit your explanation (all rows except for third are filled wrong?). Look at `df.drop` maybe or select directly: `result = result[['col1_x','col2_x','col3_y']]` `result.columns = ['col1','col2','col3']` instead of just omitting suffixes. – STerliakov Apr 02 '21 at 11:00
  • 1
    given your edit, why would ```i, k , m``` get ```3, 1 ,2``` respectively in your desired output given that we don't have them in your second file? – sophocles Apr 02 '21 at 11:06
  • Hi sophocles, thanks I edited and I was wrong to write it – Dong Gyu Lee Apr 02 '21 at 11:10
  • 1
    Thanks. Seems correct now. You can use ```df1['col3']=df1['col1'].map(dict(zip(df2.col1,df2.col3))) ``` and it should get you what you need. There are plenty of [answers](https://stackoverflow.com/questions/20250771/remap-values-in-pandas-column-with-a-dict) like these. Let us know if it doesn't work. – sophocles Apr 02 '21 at 11:12

0 Answers0