4

I'm using Python fuzzywuzzy to find matches in a list of sentences:

def getMatches(needle):
     return process.extract(needle, bookSentences, scorer=fuzz.token_sort_ratio, limit=3)

I'm trying to print out the match plus the sentences around it:

for match in matches:
     matchIndex = bookSentences.index(match)
     sentenceIndices = range(matchIndex-2,matchIndex+2)
     for index in sentenceIndices:
         print bookSentences[index],
     print '\n\n'

Unfortunately, the script fails to find the match in the original list:

ValueError: (u'Thus, in addition to the twin purposes mentioned above, this book is written for at least two groups: 1.', 59) is not in list

Is there a better way to find the index of the match in the original list? Can fuzzywuzzy some how give it to me? There doesn't seem to be anything in the readme about it.

How can I get the index in the original list of a match returned by fuzzywuzzy?

Nathan Arthur
  • 8,287
  • 7
  • 55
  • 80

1 Answers1

3

I feel a bit dumb. fuzzywuzzy returns a tuple including the score, not just the match. The solution:

for match in matches:
     matchIndex = bookSentences.index(match[0])
     sentenceIndices = range(matchIndex-2,matchIndex+2)
     for index in sentenceIndices:
         print bookSentences[index],
     print '\n\n'
Nathan Arthur
  • 8,287
  • 7
  • 55
  • 80
  • 1
    This only works for the `process.extract` method, and only because the match returned is guaranteed to be in the list. I'm using `fuzzywuzzy` to search for substrings in a long block of text, using `fuzz.partial_ratio` which only returns a score. I think I'm going to have to check out SequenceMatcher for my purposes. – Jim Wrubel Jul 06 '16 at 01:27
  • 2
    [this post](http://stackoverflow.com/a/31433394/721305) seems like a good option. – Jim Wrubel Jul 06 '16 at 01:36
  • 1
    This seems like it adds an additional unnecessary step, I wish that fuzzywuzzy process returned the tuple in the form of (index, text, score) – Veggiet Oct 15 '20 at 18:22