7

We have a third party 'tool' which finds similar names and assigns a similarity score between two names.

I am supposed to mimic the tool's behavior as closely as possible. After searching over internet, gave a shot at distance method.Used fuzzywuzzy for the same.

matches = process.extractBests(
    name, 
    choices, 
    score_cutoff=50, 
    scorer=fuzz.token_sort_ratio,
    limit=1 
);

It gave results close to the tool result.However there are few outliers - as highlighted below.

enter image description here

After further searches over internet, I have come to the understand that further refinement will need implementation of machine learning of sort. I am a complete newbie in the machine learning world - so seeking some advice as to where I should attempt at next for further code refinement.

Thanks!

Soumya
  • 885
  • 3
  • 14
  • 29
  • https://stackoverflow.com/questions/2923420/what-is-a-simple-fuzzy-string-matching-algorithm-in-python – Chris_Rands May 27 '19 at 13:34
  • Can I ask what 3rd party tool you were using for the first column? – Stpete111 Jul 01 '20 at 20:19
  • @Stpete111 The tool is bridger - https://risk.lexisnexis.com/products/bridger-insight-xg – Soumya Jul 11 '20 at 02:33
  • Thanks. Ah ok, so an actual full search solution. I thought you meant a 3rd-party name-match algorithm to which you have access to implement into your own code. – Stpete111 Jul 11 '20 at 20:13

2 Answers2

4

Take a look at this package. It is tailor-made for Name Matching HMNI Package

Yash M
  • 336
  • 3
  • 7
0

Take a look at the Jaccard and Levenshtein algorithms for fuzzy string matching. Both are relatively simple and can be implemented in about 40 or 50 lines of code.

Michael Bianconi
  • 5,072
  • 1
  • 10
  • 25