I need pandas code to calculate % of matched records. Suppose I have two column Hotel_name and Property_name and total records is 100 and 30 records matched from both the column, then % matched records should be 30%.
Asked
Active
Viewed 26 times
0
-
Provide some example data and your expected output. [Here](https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples) and [here](https://matthewrocklin.com/blog/work/2018/02/28/minimal-bug-reports) are good reads on how to make a good pandas question. – Erfan Oct 10 '20 at 10:42
1 Answers
0
If you set index on both columns, you can do outer join and then count nulls on some required column.
Example:
x = [['hotel1','property1'],['hotel2','property2'],['hotel3','property3'],['hotel9','property9']]
y = [['hotel1','property1'],['hotel2','property2'],['hotel3','property3'],['hotel4','property4'],['hotel5','property5']]
df1 = pd.DataFrame(x, columns=['hotel_name','property_name'])
df2 = pd.DataFrame(y, columns=['hotel_name','property_name'])
df1['flag'] = 1
df2['flag'] = 1
df1.set_index(['hotel_name','property_name'], inplace=True)
df2.set_index(['hotel_name','property_name'], inplace=True)
ans = df1.join(df2, how='outer', lsuffix='x', rsuffix='y',sort=True)
print("Percent Match in X is: ", 100*(1-len(ans[ans.flagx.isna()]) / len(ans)))
print("Percent Match in Y is: ", 100*(1-len(ans[ans.flagy.isna()]) / len(ans)))
print(ans)
Result:
Percent Match in X is: 66.66666666666667
Percent Match in Y is: 83.33333333333334
flagx flagy
hotel_name property_name
hotel1 property1 1.0 1.0
hotel2 property2 1.0 1.0
hotel3 property3 1.0 1.0
hotel4 property4 NaN 1.0
hotel5 property5 NaN 1.0
hotel9 property9 1.0 NaN

S2L
- 1,746
- 1
- 16
- 20