0

I need pandas code to calculate % of matched records. Suppose I have two column Hotel_name and Property_name and total records is 100 and 30 records matched from both the column, then % matched records should be 30%.

  • Provide some example data and your expected output. [Here](https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples) and [here](https://matthewrocklin.com/blog/work/2018/02/28/minimal-bug-reports) are good reads on how to make a good pandas question. – Erfan Oct 10 '20 at 10:42

1 Answers1

0

If you set index on both columns, you can do outer join and then count nulls on some required column.

Example:

x = [['hotel1','property1'],['hotel2','property2'],['hotel3','property3'],['hotel9','property9']]
y = [['hotel1','property1'],['hotel2','property2'],['hotel3','property3'],['hotel4','property4'],['hotel5','property5']]

df1 = pd.DataFrame(x, columns=['hotel_name','property_name'])
df2 = pd.DataFrame(y, columns=['hotel_name','property_name'])
df1['flag'] = 1
df2['flag'] = 1

df1.set_index(['hotel_name','property_name'], inplace=True)
df2.set_index(['hotel_name','property_name'], inplace=True)
ans = df1.join(df2, how='outer', lsuffix='x', rsuffix='y',sort=True)

print("Percent Match in X is: ", 100*(1-len(ans[ans.flagx.isna()]) / len(ans)))
print("Percent Match in Y is: ", 100*(1-len(ans[ans.flagy.isna()]) / len(ans)))
print(ans)

Result:

Percent Match in X is:  66.66666666666667
Percent Match in Y is:  83.33333333333334
                          flagx  flagy
hotel_name property_name              
hotel1     property1        1.0    1.0
hotel2     property2        1.0    1.0
hotel3     property3        1.0    1.0
hotel4     property4        NaN    1.0
hotel5     property5        NaN    1.0
hotel9     property9        1.0    NaN
S2L
  • 1,746
  • 1
  • 16
  • 20