Get terms present in a document with a collection

Asked May 19 '14 at 14:56

Active May 19 '14 at 15:16

Viewed 63 times

I'm developing a function to find terms into a document. In parameter of my function, I give a HashSet of String. I browse the HashSet to analyze each string (with the Lucene Analyzer class) then I seek the analyzed string into the text with the PhraseQuery class to know if it exists into the document. In return of my function, there is a HashSet which contains only terms found into the document.

It works, but slowly because of I browse all the HashSet. Is there no way to give a collection of words to Lucene, then get a collection with only the words that the document gets?

edited May 19 '14 at 15:16

Chris Mantle

6,595
3
34
48

asked May 19 '14 at 14:56

taubhi

Wow! I was just asking almost exactly the same question: "Let's say I have 100 (possibly multi-word) strings and I want to ask Lucene which of these terms are present in a particular document. In other words, I want to get an intersection of query terms vs a document. Is it possible? Is it a valid use case for Lucene?" – Marcin May 19 '14 at 14:59
3

I guess this question was already asked and answered here: http://stackoverflow.com/questions/7896183/get-matched-terms-from-lucene-query – Marcin May 20 '14 at 07:41
Thank you very much, I didn't find this question ! It brought me to find this other good answer : http://stackoverflow.com/questions/2851473/lucene-get-matched-terms-in-query Thanks again ! – taubhi May 20 '14 at 08:36

Get terms present in a document with a collection

0 Answers0