Get stemmed word in Lucene

Question

In Lucene I use the SnowballAnalyzer for indexing and searching.

When I have the index built I make queries on my index. For example I make a query 'specialized' for the field 'body'. IndexSearcher returns documents containing 'specialize, specialized etc.' because of the stemming done by the SnowballAnalyzer.

Now - having top documents - I want to get a text snippet from the body field. This snipped should contain the stemmed version of the query word.
For example one of the returned documents has the body field: "Unfortunately, in some states, blind people only have access to general rehabilitation agencies, which serve people with a variety of disabilities. In these cases, specialized services for visually impaired people are not always available." Then I wish to get the part 'In these cases, specialized services for visually' as the snippet. Additionally I want to have terms from this snippet. Code which will do it, but with one marked '?' character, where I have a question is:

How I want to do it is IndexReader ir = IndexReader.open(fsDir); TermPositionVector tv = (TermPositionVector)ir.getTermFreqVector(hits.scoreDocs[i].doc, "body");

? - here: query - query has to be the term. So if the real query was 'specialized' then the query should be specialize, what normally the snowball analyzer does. How can I get the term analyzed by the analyzer for a single word or a phrase, since query can contain a phrase: "specialized machines".

int idx = tv.indexOf(query); int [] idxs = tv.getTermPositions(idx); for(String t : tv.getTerms()){ int iidx = tv.indexOf(t); int [] iidxs = tv.getTermPositions(iidx); for(int ni : idxs){ tmpValue = 0.0f; for(int nni : iidxs){ if(Math.abs(nni-ni)<= Settings.termWindowSize){

edit
I found the way to get the stemmed term:
Query q = queryParser.parse("some text to be parsed"); String parsedQuery = q.toString();
There is a method for the Query object toString(String fieldName);

score 0 · Accepted Answer · edited May 23 '17 at 12:29

0

I believe you are mixing several questions. First, to see the stemmed version of your query, and other useful information, you can use the IndexSearcher's explain() method. Please see my answer to this question.

The Lucene solution for getting snippets is the Highlighter. Another option is the FastVectorHighlighter. I believe you can customize both to get the stemmed term rather than the full one.

edited May 23 '17 at 12:29

Community

1
1

answered Nov 21 '10 at 09:33

Yuval F

20,565
5
44
69

Thx for your reply. Please see my post update to see a way to get the stemmed term. – Jakub Nov 21 '10 at 19:13

Get stemmed word in Lucene

1 Answers1