In Lucene I use the SnowballAnalyzer for indexing and searching.
When I have the index built I make queries on my index. For example I make a query 'specialized' for the field 'body'. IndexSearcher returns documents containing 'specialize, specialized etc.' because of the stemming done by the SnowballAnalyzer.
Now - having top documents - I want to get a text snippet from the body field. This snipped should contain the stemmed version of the query word.
For example one of the returned documents has the body field: "Unfortunately, in some states, blind people only have access to general rehabilitation agencies, which serve people with a variety of disabilities. In these cases, specialized services for visually impaired people are not always available."
Then I wish to get the part 'In these cases, specialized services for visually' as the snippet.
Additionally I want to have terms from this snippet. Code which will do it, but with one marked '?' character, where I have a question is:
How I want to do it is
IndexReader ir = IndexReader.open(fsDir);
TermPositionVector tv = (TermPositionVector)ir.getTermFreqVector(hits.scoreDocs[i].doc, "body");
? - here: query - query has to be the term. So if the real query was 'specialized' then the query should be specialize, what normally the snowball analyzer does. How can I get the term analyzed by the analyzer for a single word or a phrase, since query can contain a phrase: "specialized machines".
int idx = tv.indexOf(query);
int [] idxs = tv.getTermPositions(idx);
for(String t : tv.getTerms()){
int iidx = tv.indexOf(t);
int [] iidxs = tv.getTermPositions(iidx);
for(int ni : idxs){
tmpValue = 0.0f;
for(int nni : iidxs){
if(Math.abs(nni-ni)<= Settings.termWindowSize){
edit
I found the way to get the stemmed term:
Query q = queryParser.parse("some text to be parsed");
String parsedQuery = q.toString();
There is a method for the Query object toString(String fieldName);