0

I am using fuzzywuzzy match to find similarity between sentences.

when I compare these two sentences- 'user attempts login' and 'acceptance criteria'

fuzz.token_set_ratio('user attempts login', 'acceptance criteria')

it gives me a score of 42 .

could someone please help me understand how we get score of 42 when there are no matching words ??

Hari
  • 1
  • 1
  • Check out [When to use which fuzz function to compare 2 strings](https://stackoverflow.com/questions/31806695/when-to-use-which-fuzz-function-to-compare-2-strings) – DarrylG Mar 23 '21 at 14:07

1 Answers1

0

Steps of the Algorithm

Token_set_ratio performs the following steps:

  1. split sentence and remove duplicates
  2. create three lists of
    • remainder1 = words that are only in the first sentence
    • remainder2 = words that are only in the second sentence
    • intersection = words that are in both sentences
  3. sort the words in the three lists and join the elements to a combined string
    • sorted_remainder1
    • sorted_remainder2
    • sorted_intersection
  4. join the strings in the following way:
    • combined1 = <sorted_intersection><sorted_remainder1>
    • combined2 = <sorted_intersection><sorted_remainder2>
  5. calculate the following similarities:
    • fuzz.ratio(sorted_intersection, combined1)
    • fuzz.ratio(sorted_intersection, combined2)
    • fuzz.ratio(combined1, combined2)
  6. return the maximum of those similarities

Example

For the strings user attempts login and acceptance criteria this leads to the following result:

remainder1 = ['user', 'attempts', 'login']
remainder2 = ['acceptance', 'criteria']
intersection = []
sorted_remainder1 = 'attempts login user'
sorted_remainder2 = 'acceptance criteria'
combined1 = 'attempts login user'
combined2 = 'acceptance criteria'

fuzz.ratio(sorted_intersection, combined1) = 0
fuzz.ratio(sorted_intersection, combined2) = 0
fuzz.ratio(combined1, combined2) = 42

In your specific case this is a similar result to fuzz.token_sort_ratio, which only sorts the words in both sentences and compares them using fuzz.ratio.

maxbachmann
  • 2,862
  • 1
  • 11
  • 35