1

The various examples I see about how to find positions of the matches returned by an IndexSearcher either require retrieving the document's content and search a TokenStream or to index the positions and offsets in the term vectors, turn the query into a term and find it in the term vector. But what happens when I use a FuzzyQuery? Is there a way to know which term(s) exactly matched in the hit so that I may look for them in the term vector of this document?

In case that's of any value, I'm new to Lucene and my goal here is to annotate a set of documents (the ones indexed in Lucene) with a set of terms, but the documents are from scanned texts and contain OCR errors, therefore I must use a FuzzyQuery. I thought about using lucene-suggest to do some spellchecking beforehand but it occured to me that it boiled down to trying to find fuzzy matches.

Yves Parès
  • 581
  • 5
  • 14
  • possible duplicate of [using hit highlighter in lucene](http://stackoverflow.com/questions/2409870/using-hit-highlighter-in-lucene) – Mark Leighton Fisher Aug 12 '14 at 16:48
  • Indeed, I missed the explain function. The big problem is an Explanation is essentially just a String, I can't access a collection of the matching Terms for instance. – Yves Parès Aug 12 '14 at 17:19

0 Answers0