I have df1 and df2. I want to use fuzzywuzzy to string match column A in df1 to column A in df2, and return an ID in column B of df2 based on a certain ratio match.
For example:
df1 looks like this:
Name
Sally sells Seashells
df2 looks like this:
Name | ID
Sally slls sshells | 28904
What I'm trying to do is compare everything in column A in df1 to find a match in column A in df2 and return the ID from column B in df2.
I would like to be able to set the criteria of the fuzzy ratio. For example: I only want it to return an ID if the ratio is above 50.
My current code:
import pandas as pd
import numpy as np
from fuzzywuzzy import fuzz
from fuzzywuzzy import process
df1=pd.read_csv('C:\\Users\\nkurdob\\Desktop\\Sheet1.csv')
df2=pd.read_csv('C:\\Users\\nkurdob\\Desktop\\Sheet2.csv')
for i in range(len(df1)):
em = df1['A'][i]
test = fuzz.partial_ratio(em, df2['A'])
if test > 50:
print df1['A'][i]==df2['B']