0

I want to check this - user='Jefferey Roberts' and fuzzywuzzy is giving this result - result=[('Jeremiah James Roberts Jr', 86), ('Jeffrey Scott Roberts', 81), ('Jeremiah J Roberts', 71)]

Code -

from fuzzywuzzy import process
user='Jefferey Roberts'
result=['Jeremiah James Roberts Jr', 'Jeffrey Scott Roberts', 'Jeremiah J Roberts']
output=process.extract(user,result)
print(output)

It should have given more scores to the second element of the result list.

And similarly, if I am using get_close_matches of difflib module for this list ['Gary Wayne Waller', 'Zayn Waller', 'Debra Kay Waller'] and search for 'Gary Waller', it returns Zayn Waller instead of Gary Wayne Waller at first index'

Code-

from difflib import get_close_matches
user='Gary Waller'
result= ['Gary Wayne Waller', 'Zayn Waller', 'Debra Kay Waller']
output=get_close_matches(user,result)
print(output)

Please help with any solution or any better accurate module other than fuzzywuzzy and get_close_matches.

1 Answers1

0

You can use "SequenceMatcher"

from difflib import SequenceMatcher

b = "Jefferey Roberts"
a_lst = ['Jeremiah James Roberts Jr', 'Jeffrey Scott Roberts', 'Jeremiah J Roberts']

for a in a_lst:
    print(a,SequenceMatcher(None, a, b).ratio())

Output;

Jeremiah James Roberts Jr 0.5853658536585366
Jeffrey Scott Roberts 0.8108108108108109
Jeremiah J Roberts 0.7058823529411765

Edit:

Checkout this post on similar match b/w strings to see all kinds of algorithm/package available for the matching... Find the similarity metric between two strings

Sachin Kohli
  • 1,956
  • 1
  • 1
  • 6
  • But if you take this list ['Gary Wayne Waller', 'Zayn Waller', 'Debra Kay Waller'] and search for 'Gary Waller', it returns high value for Zayn Waller instead of Gary Wayne Waller' – faizan khan Oct 01 '22 at 15:45
  • I've edited my answer with a link... probably you can find that one module which works for your problem statement...Thanks – Sachin Kohli Oct 01 '22 at 17:22