0

I am looking to execute the script only when it satisfies the condition.

If Column1 is not blank then only we can use the below script else will print the message. I have tried several ways but couldn't find the possible way to work.

Sheet1

id_number  company_name        match_acc

IN2231D    AXN pvt Ltd
UK654IN    Aviva Intl Ltd
           Ship Incorporations
LK0678G    Oppo Mobiles pvt ltd
NG5678J    Nokia Inc

Sheet2

identity_no   Pincode   company_name

 IN2231        110030    AXN pvt Ltd
 UK654IN       897653    Aviva Intl Ltd
 SL1432        07658     Ship Incorporations
 LK0678G       120988    Oppo Mobiles Pvt Ltd

Script i have been using

df1 = pd.read_excel(open(r'input.xlsx', 'rb'), sheet_name='sheet1')
df2 = pd.read_excel(open(r'input.xlsx', 'rb'), sheet_name='sheet2')

if df1[['id_number']] is not NaN:
  cross = df1[['id_number']].merge(df2[['identity_no']], how='cross')
  cross['match_acc'] = cross.apply(lambda x: fuzz.ratio(x.id_number, x.identity_no), axis=1)
  df1['match_acc'] = df1.id_number.map(cross.groupby('id_number').match_acc.max()) 

How we can execute it if i try using it within a function :

def word(x,y):
   cross = df1[['id_number']].merge(df2[['identity_no']], how='cross')
   cross['match_acc'] = cross.apply(lambda x: fuzz.ratio(x.id_number, y.identity_no), axis=1)
   df1.id_number.map(cross.groupby('id_number').match_acc.max()) 

df['match_acc'] = df1.apply(lambda x:word if (x['id_number'] == 'NaN') else 'No',1)

Please suggest

NKJ
  • 457
  • 1
  • 4
  • 11

2 Answers2

0

Is this what you're looking for

def word(df1):
   cross = df1[['id_number']].merge(df2[['identity_no']], how='cross')
   cross['match_acc'] = cross.apply(lambda x: fuzz.ratio(x.id_number, x.identity_no), axis=1)
   return df1.id_number.map(cross.groupby('id_number').match_acc.max()) 

df['match_acc'] = df1.apply(lambda x:word(x) if (x['id_number'] == 'NaN') else 'No',axis=1)
lyncx
  • 680
  • 4
  • 8
  • TypeError: object of type 'float' has no len() – NKJ May 04 '21 at 05:42
  • When it comes to " cross['match_acc'] = cross.apply(lambda x: fuzz.ratio(x.id_number, x.identity_no), axis=1)", gives the error. – NKJ May 04 '21 at 05:47
  • There was a typo in df['match_acc'] = df1.apply(lambda x:word(x) if (x['id_number'] == 'NaN') else 'No',axis=1) replace df1 with df in df1.apply( ... ) – lyncx May 04 '21 at 05:52
-1

I'm not sure how your end result is supposed to look, but I assume you want to merge df1 and df2 based on a fuzzy match on id_number/ identity.

Try these related answers that merge Pandas dataframes with fuzzy logic:

Fuzzy matching to join two dataframe

or try the example below from another SO answer. This might solve your problem without additional complex code.

import pandas as pd
import fuzzy_pandas as fpd

df1 = pd.DataFrame({'Key':['Apple', 'Banana', 'Orange', 'Strawberry']})
df2 = pd.DataFrame({'Key':['Aple', 'Mango', 'Orag', 'Straw', 'Bannanna', 'Berry']})

results = fpd.fuzzy_merge(df1, df2,
            left_on='Key',
            right_on='Key',
            method='levenshtein',
            threshold=0.6)

results.head()
  Key    Key
0 Apple  Aple
1 Banana Bannanna
2 Orange Orag

Source: https://stackoverflow.com/a/60634160/13530653

Mickster
  • 189
  • 1
  • 6