We have a third party 'tool' which finds similar names and assigns a similarity score between two names.
I am supposed to mimic the tool's behavior as closely as possible. After searching over internet, gave a shot at distance method.Used fuzzywuzzy for the same.
matches = process.extractBests(
name,
choices,
score_cutoff=50,
scorer=fuzz.token_sort_ratio,
limit=1
);
It gave results close to the tool result.However there are few outliers - as highlighted below.
After further searches over internet, I have come to the understand that further refinement will need implementation of machine learning of sort. I am a complete newbie in the machine learning world - so seeking some advice as to where I should attempt at next for further code refinement.
Thanks!