0

I have been trying to merge two excel sheets together based on their own name columns. I used this as a reference.

In df1 the name column is Local Customer df2 the name column is Client

import difflib
import pandas as pd

path=  'C:/Users/Rukgo/Desktop/Match thing/'

df1=pd.read_excel(path+'2015_TXNs.xls')
df2=pd.read_excel(path+'Pipeline.xls')
df2['Client'] = df2['Client'].apply(lambda x: difflib.get_close_matches(x, df1['Local Customer'])[0])

result = df1.join(df2)

result.to_excel(path+'maybe.xls')

Though when I run this I get a the following error:

File "C:/Users/Rukgo/Desktop/Match Things/untitled3.py", line 16, in <lambda>
df2['Client'] = df2['Client'].apply(lambda x: difflib.get_close_matches(x, df1['Local Customer'])[0])

IndexError: list index out of range
Community
  • 1
  • 1
Rukgo
  • 121
  • 1
  • 13

1 Answers1

1

It means that sometimes, no close matches are found, but you're asking for the first element [0] of an empty list

Add a condition to your lambda function to test if there are at least one result.

Here is the code if you decide to keep the element of the original DataFrame in case of no matched element.

def replacement(x, **kwargs):
    matches = difflib.get_close_matches(x, kwargs['l'])
    if ( matches ):
        return matches[0]
    else:
        return x


df2['Client'] = df2['Client'].apply( replacement(x, l=df1['Local Customer'] ) )
Ghilas BELHADJ
  • 13,412
  • 10
  • 59
  • 99
  • Awesome that doesn't throw any errors! thank you. Though when I open up the new excel sheet all the headers from `df2` are there with none of the information – Rukgo Mar 24 '16 at 23:22