Looking out to find the highest accuracy percentage between 2 column values by using Fuzzy string matching.
I have 2 dataframes where i am trying to use fuzzy match between an specific columns values from both the dataframes.
Lets say df1 has 5 rows and df2 has 4 rows, i want to pick the values of each row of df1 and match with every row of df2 and find the highest accuracy. Suppose Row1 from df1 has been compared with all the rows from df2 so whichever the row from df2 has highest accuracy we will consider it as output. same to be considered for each row from df1.
Input Data :
Dataframe1
id_number company_name match_acc
IN2231D AXN pvt Ltd
UK654IN Aviva Intl Ltd
SL1432H Ship Incorporations
LK0678G Oppo Mobiles pvt ltd
NG5678J Nokia Inc
Dataframe2
identity_no Pincode company_name
IN2231 110030 AXN pvt Ltd
UK654IN 897653 Aviva Intl Ltd
SL1432 07658 Ship Incorporations
LK0678G 120988 Oppo Mobiles Pvt Ltd
Want to find the highest accuracy percentage and submit the values in match_acc column.
Code i have been using currently:
df1 = pd.read_excel(open(r'input.xlsx', 'rb'), sheet_name='sheet1')
df2 = pd.read_excel(open(r'input.xlsx', 'rb'), sheet_name='sheet2')
from fuzzywuzzy import fuzz
for index, row in df1.iterrows():
df1['match_acc']= fuzz.partial_ratio(df1['id_number'], df2['identity_no'])
print(df1['match_acc'])
I have been using Fuzzywuzzy, If there is any other method as well pls suggest.
Any Suggestion.