0

I am no expert in Lucene but I wanted to modify it in order to fulfill the requirement as:

I treat each sentence as a different document and index it in order to test the search.

For example,

Input sentences: Sam eats apples. Jeff eats oranges. Sam sam and sam eats apples and only apples. Jeff jeff and only jeff eats oranges oranges and only oranges.

Current Search

When I search for "Sam eats apples", current Lucene search scores "Sam sam and sam eats apples and only apples." the highest because it contains the terms like "sam", "eats" and "apples" more and the document appears at the top.

Modified Search(what I want)

Now, when I search for "Sam eats apples", I want the document "Sam eats apples" to score highest because it has got the exact match (along with the sequence i.e. Sam (first), eats (second) and apples (third) according to the query).

What I have thought to do: Make my custom query, weight and scoring (by extending Query, Weight and Scoring classes).

Is it feasible? and worth? OR are there any options except this?

Any suggestion would be valuable to me since I am just a beginner in Lucene.

Sujan
  • 1,542
  • 2
  • 22
  • 42
  • That's the toughest route, `Weight` is the place where many optimizations happen. Just try to have a look at one of the standard implementations, I promise you a dizzy spell. – Marko Topolnik Feb 11 '14 at 12:15
  • I also believe that the existing library already has enough to support your type of query. – Marko Topolnik Feb 11 '14 at 12:16
  • 1
    Have you checked out `PhraseQuery`? http://stackoverflow.com/questions/5527868/exact-phrase-search-using-lucene – Marko Topolnik Feb 11 '14 at 12:17
  • Not yet. I think I need to look at it. – Sujan Feb 11 '14 at 12:20
  • @MarkoTopolnik I tried `PhraseQuery` as you suggested. It did match the exact query but I needed the exact match at the top while others at the bottom. – Sujan Feb 12 '14 at 09:26

1 Answers1

0

If you are combining PhraseQuery as suggested by @MarkoTopolnik with other queries, then set slop to 0 (which is set by default) for that query which will make sure that documents containing provided search words and their sequence will be included in result and setBoost for that PhraseQuery greater than 1.0 which will increase score for result obtained by that PhraseQuery. I don't know if this is optimum solution for your need but it worked for me in similar situation. I had to provide boost more than 4.0 later I discovered that there was scope for improvement in my query combination. After optimising my BooleanQuery which I used to combine multiple queries I was able to adjusted boot values between 1.0 to 2.0

Yogesh
  • 4,546
  • 2
  • 32
  • 41