Apache Lucene: How to get the first matching substring from a Document

Question

I could not find any info on the web and stackoverflow on how to get the first matching character subsequence from a Lucene Document.

ATM i'm using this logic to retrieve results from Lucene:

        Document doc=searcher.doc(hit.doc);
        String text=doc.get("text");
        if (text.length() > 80){
            text=text.substring(0,80);
        }
        results.add(new SearchResult(doc.get("url"), doc.get("title"), text));

As you can see this just takes the first 80 chars of the searched text and wraps it together with some other data into a SearchResult object.

Is it somehow possible to retrieve the first or even highest scoring subsequence of the text which actually contains any searchterms?

score 2 · Accepted Answer · edited May 23 '17 at 12:02

2

You need Lucene Highlighter. Here and here you can find some more info on it.

edited May 23 '17 at 12:02

Community

1
1

answered Oct 20 '10 at 14:39

ffriend

27,562
13
91
132

1

Also note that there are several Highlighter implementations for both Lucene 2.x and Lucene 3.0. Take the one that fits your task better. – ffriend Oct 20 '10 at 14:41

score 1 · Answer 2 · edited May 23 '17 at 10:33

1

It is called hit highlighter. This is probably a duplicate of another highlighter question

edited May 23 '17 at 10:33

Community

1
1

answered Oct 20 '10 at 14:39

Eugene Kuleshov

31,461
5
66
67

Apache Lucene: How to get the first matching substring from a Document

2 Answers2