I am trying to group the similar names of companies basis fuzzy matching ( within same column ). But neither they are grouping correctly nor do I have the same number of rows in the resulting dataset. As a result of one to many match, the number of rows are more than what is there in original data.
Input File sample with more records
- **Code **
df.loc[:,'Account Name Copy'] = df['Account Name']
compare = pd.MultiIndex.from_product([df['Account Name'],
df['Account Name Copy']]).to_series()
def metrics(tup):
return pd.Series([fuzz.ratio(*tup),
fuzz.token_sort_ratio(*tup)],
['ratio', 'token'])
compare.apply(metrics)
Current Output
P.S. The number of rows should remain the same in final output as it is in original data with similar company names being grouped.
Desired Output
Referred to below topics, but didn't get the desired output
https://stackoverflow.com/questions/71427827/fuzzy-matching-and-grouping
https://stackoverflow.com/questions/60987641/check-if-there-is-a-similar-string-in-the-same-column
https://stackoverflow.com/questions/62085777/fuzzy-match-within-the-same-column-python
Please help !!