0

Working on an Excel file and looking to find out accuracy percentage by matching values from another sheet using an dataframe. Matching other column values using one column having unique values.

I have tried using Fuzzy match / any other possible method but didn't worked

Input Data: Sheet 1:

identity_no  address            Pincode   company_name

 IN2231      Delhi, Indi        110030    AXN pvt Ltd
 UK654       London, Uk         897653    Aviva Intl Ltd
 SL1432      Colombo, Srilanka  07658     Ship Incorporations
 LK0678      Libya, Sns         5674332   Oppo Mobiles pvt ltd

Master Data Sheet 2

identity_no  address            Pincode   company_name

 IN2231      Delhi, India       110030    AXN pvt Ltd
 UK654       London, Uk         897653    Aviva Intl Ltd
 SL1432      Colombo, Srilanka  07658     Ship Incorporations

Expected Output:

identity_no  address            Pincode   company_name               match_percent
    
     IN2231      Delhi, Indi        110030    AXN pvt Ltd                
     UK654       London, Uk         897653    Aviva Intl Ltd
     SL1432      Colombo, Srilanka  07658     Ship Incorporations
     LK0678      Libya, Sns         5674332   Oppo Mobiles pvt ltd

Code i have tried so far:

df = pd.read_excel(open(r'input.xlsx', 'rb'), sheet_name='sheet1')
df2 = pd.read_excel(open(r'master_data.xlsx', 'rb'), sheet_name='sheet2')

for index, row in df.iterrows():
    for index_config, val_new in df2.iterrows():
        if row['identity_no  '] == row_config['identity_no']:
           df[['identity_no','address', 'Pincode', 'company_name']][Index] = val_config[['identity_no','address', 'Pincode', 'company_name']]

Here mapping the values from sheet2 to sheet1 but i am also looking to find out the accuracy of the matching on How accurately the columns have been matched.

Any Suggestions.

NKJ
  • 457
  • 1
  • 4
  • 11
  • `df.merge(df2,on=['identity_no','address','Pincode','company_name'])` and how you find accuracy? any calculations or what? – Anurag Dabas Apr 05 '21 at 09:18
  • @AnuragDabas - I am trying to find the accuracy percentage by matching the Column values from sheet 1 with sheet 2. How accurately the values are matching. – NKJ Apr 05 '21 at 09:41
  • `df.merge(df2,on=['identity_no','address','Pincode','company_name']).assign(accuracy='100%')`? – Anurag Dabas Apr 05 '21 at 09:53

1 Answers1

0

So if I understand correctly, you have one dataframe with some data df that you want to match against the indices in a template df2 and for each matching index, you want to compute the number of elements that are alike.

# For simplicity, let's define the index of the dataframes
df = df.set_index("identity_no")
df2 = df2.set_index("identity_no")

# You define a function that returns NaN when index does not exist and the accuracy score if it does (from 0 to 1)
def accuracy_score(row):
    if row not in df2.index:
        return float("nan")
    return sum(row[col] == df2.loc[row.name, col] for col in row.index) / len(row)

# You apply the function to your dataframe
df["accuracy"] = df.apply(accuracy_score, axis=1)
qmeeus
  • 2,341
  • 2
  • 12
  • 21
  • How to check and find accuracy to match an specific column with another – NKJ Apr 07 '21 at 12:51
  • Don't know what you mean, can you rephrase? – qmeeus Apr 07 '21 at 12:53
  • I want to find the accuracy percentage by matching the values of two columns from different sheets. and update the percentage value in match_percent column. Lets suppose Row1 from both the columns matches by 85% so will update in match_percent column. – NKJ Apr 07 '21 at 18:45