Strange behaviour in FuzzyWuzzy extract

Question

I´m trying to use FuzzyWuzzy to correct misspelled names in a text. However I can't get process.extract and process.extractOne to behave the way I expected them to.

from fuzzywuzzy import process

the_text = 'VICTOR HUGO e MARIANA VEIGA'
search_term = 'VEYGA'

the_text = the_text.split()
found_word = process.extract(search_term, the_text)

print(found_word)

This results in:

[('e', 90), ('VEIGA', 80), ('HUGO', 22), ('VICTOR', 18), ('MARIANA', 17)]

How can I get FuzzyWuzzy to correctly identify 'VEIGA' as the correct response?

score 2 · Accepted Answer · answered May 22 '18 at 13:13

you can try to use: fuzz.token_set_ratio or fuzz.token_sort_ratio The answers here: When to use which fuzz function to compare 2 strings gives an excellent explanation.

for completes here is a bit of code:

from fuzzywuzzy import process
from fuzzywuzzy import fuzz

the_text = 'VICTOR HUGO e MARIANA VEIGA'
search_term = 'VEYGA'

the_text = the_text.split()
found_word = process.extract(search_term, the_text, scorer=fuzz.token_sort_ratio)

print(found_word)

output:

[('VEIGA', 80), ('e', 33), ('HUGO', 22), ('VICTOR', 18), ('MARIANA', 17)]

Thank you so much. The link to the other answer is very helpful! — RogB, May 22 '18 at 13:48

Strange behaviour in FuzzyWuzzy extract

1 Answers1