4

I'm using the FuzzyWuzzy String Matching module from SeatGeek.

I find that when using the token_set_ratio search algorithm, small differences in case gives wildly differing results.

For example, if I am looking for the phrase "I am eating" in a file, I get a 100% match. But if the phrase is "i am eating", just the change in case of ONE letter, gives me a 65% match.

Is there any way to make the algorithm case insensitive?

jww
  • 97,681
  • 90
  • 411
  • 885
shoi
  • 167
  • 1
  • 3
  • 7

4 Answers4

3

token_set_ratio() is case insensitive by default.

from fuzzywuzzy import fuzz
fuzz.token_set_ratio("I am eating", "i am eating")
=> 100
Foxan Ng
  • 6,883
  • 4
  • 34
  • 41
acslater00
  • 417
  • 3
  • 5
  • Why this answer has -1 ? As far as i see it is saying the truth - it is case insensitive by default (kwarg token_process=False would make it case sensitive) – The Hog Jul 18 '18 at 11:25
  • @SarunasAzna I can only make a presumption for whomever did the -1, but the answer states it is case sensitive, rather than insensitive. There are also other differences with token_set_ratio beyond just case sensitivity. – Nate Wanner Jul 18 '18 at 13:23
1

I had the same issue, you probably were using Ratio and not TokenSetRatio...

0

If you go through the raw code of fuzz here , you would find that fuzz.token_set_ratio converts strings to lower case before doing the sequence matching .

Further, you may want to check this stackoverflow post here from SeatGeek engineer for better clarity on ratio usage.

Hope this helps

Nim J
  • 993
  • 2
  • 9
  • 15
0

I just converted the strings that I am comparing to lowercases:

fuzz.token_set_ratio("I am eating".lower(), "i am eating".lower())

This gives me a score of 100

Glenn
  • 11
  • 2