I have been working on an Excel file and looking out to find out accuracy percentage by using Fuzzy match / any other method from another sheet using an dataframe. Matching other column values using one column having unique values.
Input values:
Sheet 1
identity_no address Pincode company_name
IN2231 Delhi, Indi 110030 AXN pvt Ltd
UK654 London, Uk 897653 Aviva Intl Ltd
SL1432 Colombo, Srilanka 07658 Ship Incorporations
LK0678 Libya, Sns 5674332 Oppo Mobiles pvt ltd
Master Data Sheet 2
identity_no address Pincode company_name
IN2231 Delhi, India 110030 AXN pvt Ltd
UK654 London, Uk 897653 Aviva Intl Ltd
SL1432 Colombo, Srilanka 07658 Ship Incorporations
Looking out to map the column values from Master Data Sheet 2 and update the values into Sheet 1 only when matching accuracy percentage is more than 75% . based on identity_no. Also if any identity_no is not present in Sheet 2, We also need to update into sheet 2.
I have tried to use Fuzzy match bot couldn't worked
Script i have been trying to use so far is
df = pd.read_excel(open(r'input.xlsx', 'rb'), sheet_name='sheet1')
df2 = pd.read_excel(open(r'master_data.xlsx', 'rb'), sheet_name='sheet2')
from difflib import SequenceMatcher
lst = df2[['identity_no','address', 'Pincode', 'company_name']]
def closest(s):
highest = 0
result = ''
for i in lst:
temp = SequenceMatcher(None, s, i).ratio() #Similarity Ratio
if temp > highest:
highest = temp
result = i
return result
How to check for the matching accuracy percentage and only execute the further process when it is more than 75%
df[['identity_no','address', 'Pincode', 'company_name']] = df[['identity_no','address', 'Pincode', 'company_name']] .apply(lambda x: closest(x))