FuzzyWuzzy String Matching - Case Sensitivity

Question

I'm using the FuzzyWuzzy String Matching module from SeatGeek.

I find that when using the token_set_ratio search algorithm, small differences in case gives wildly differing results.

For example, if I am looking for the phrase "I am eating" in a file, I get a 100% match. But if the phrase is "i am eating", just the change in case of ONE letter, gives me a 65% match.

Is there any way to make the algorithm case insensitive?

you could just use `.upper` on everything? – Andy Hayden May 09 '13 at 13:39 — Andy Hayden, May 09 '13 at 13:39

score 3 · Answer 1 · edited Oct 10 '18 at 14:12

3

token_set_ratio() is case insensitive by default.

from fuzzywuzzy import fuzz
fuzz.token_set_ratio("I am eating", "i am eating")
=> 100

edited Oct 10 '18 at 14:12

Foxan Ng

6,883
4
34
41

answered Jan 09 '14 at 17:05

acslater00

417
3
5

Why this answer has -1 ? As far as i see it is saying the truth - it is case insensitive by default (kwarg token_process=False would make it case sensitive) – The Hog Jul 18 '18 at 11:25
@SarunasAzna I can only make a presumption for whomever did the -1, but the answer states it is case sensitive, rather than insensitive. There are also other differences with token_set_ratio beyond just case sensitivity. – Nate Wanner Jul 18 '18 at 13:23

score 1 · Answer 2 · answered Nov 05 '20 at 11:02

1

I had the same issue, you probably were using Ratio and not TokenSetRatio...

answered Nov 05 '20 at 11:02

Pieter Buitelaar

11
1

score 0 · Answer 3 · answered Sep 21 '17 at 13:47

If you go through the raw code of fuzz here , you would find that fuzz.token_set_ratio converts strings to lower case before doing the sequence matching .

Further, you may want to check this stackoverflow post here from SeatGeek engineer for better clarity on ratio usage.

Hope this helps

score 0 · Answer 4 · answered Mar 16 '23 at 18:49

0

I just converted the strings that I am comparing to lowercases:

fuzz.token_set_ratio("I am eating".lower(), "i am eating".lower())

This gives me a score of 100

answered Mar 16 '23 at 18:49

Glenn

11
2

FuzzyWuzzy String Matching - Case Sensitivity

4 Answers4