However, exact match cases do not receive presendence in the search results:
Just use two queries instead of one:
EDIT: you will also need to set up two separate fields for autocomplete and "exact" match; see my edit at the bottom.
org.apache.lucene.search.Query exactSearchByName = qb.keyword().onField("name")
.matching(userInput).createQuery();
org.apache.lucene.search.Query fuzzySearchByName = qb.keyword().fuzzy()
.withEditDistanceUpTo(1).onField("name")
.matching(userInput).createQuery();
org.apache.lucene.search.Query searchByName = qb.boolean().should(exactSearchByName).should(fuzzySearchByName).createQuery();
booleanQuery.add(searchByName, BooleanClause.Occur.MUST);
This will match documents that contain the user input exactly or approximately, so this will match the same documents as your example. However, documents that contain the user input exactly will match both queries, while documents that only contain something similar will only match the fuzzy query. As a result, exact matches will have a higher score and end up higher up in the result list.
If exact matches are not high enough, try adding a boost to the exactSearchByName
query:
org.apache.lucene.search.Query exactSearchByName = qb.keyword().onField("name")
.matching(userInput)
.boostedTo(4.0f)
.createQuery();
I guess however that this contradics with the minGramSize of 1 and the WhitespaceTokenizerFactory?
If you want to match documents that contain any word (but not necessarily all words) appearing in the user input, and to put documents containing more words higher in the result list, do what I explained above.
If you want to match documents that contain all words in the exact same order, use a KeywordTokenizerFactory
(i.e. no tokenizing).
If you want to match documents that contain all words in any order, well... that's less obvious. There's no support for that in Hibernate Search (yet), so you will essentially have to build the query yourself. One hack that I've already seen is something like this:
Analyzer analyzer = fullTextSession.getSearchFactory().getAnalyzer( "myAnalyzer" );
QueryParser queryParser = new QueryParser( "name", analyzer );
queryParser.setOperator( Operator.AND ); // Match *all* terms
Query luceneQuery = queryParser.parse( userInput );
... but that will not generate fuzzy queries. If you want fuzzy queries, you can try to override some methods in a custom subclass of QueryParser. I didn't try this, but it might work:
public final class FuzzyQueryParser extends QueryParser {
private final int maxEditDistance;
private final int prefixLength;
public FuzzyQueryBuilder(String fieldName, Analyzer analyzer, int maxEditDistance, int prefixLength) {
super( fieldName, analyzer );
this.maxEditDistance = maxEditDistance;
this.prefixLength = prefixLength;
}
@Override
protected Query newTermQuery(Term term) {
return new FuzzyQuery( term, maxEditDistance, prefixLength );
}
}
EDIT: With a minGramSize of 1, you will get lots of very frequent terms: single or two-character terms extracted from the beginning of words. It is likely to cause many unwanted matches that will be scored high (because the terms are frequent) and will probably drown exact matches.
First, you can try setting the similarity (~ scoring formula) to org.apache.lucene.search.similarities.BM25Similarity
, which is better at ignoring very frequent terms. See here for the setting. That should improve scoring with the same analyzers.
Second, you can try setting up two fields instead of one: one field for fuzzy autocomplete and one for non-fuzzy, complete matches. That may improve the score of exact matches since there will be less meaningless terms indexed for the field used for exact matches. Just do this:
@Field(name = "name", analyzer = @Analyzer(definition = "text")
@Field(name = "name_autocomplete", analyzer = @Analyzer(definition = "edgeNgram")
private String name;
The analyzer "text" is just the analyzer "edgeNGram_query" from the answer you linked; just rename it.
The proceed with writing two queries instead of one as explained above, but make sure to target two different fields:
org.apache.lucene.search.Query exactSearchByName = qb.keyword().onField("name")
.matching(userInput).createQuery();
org.apache.lucene.search.Query fuzzySearchByName = qb.keyword().fuzzy()
.withEditDistanceUpTo(1).onField("name_autocomplete")
.matching(userInput).createQuery();
org.apache.lucene.search.Query searchByName = qb.boolean().should(exactSearchByName).should(fuzzySearchByName).createQuery();
booleanQuery.add(searchByName, BooleanClause.Occur.MUST);
Don't forget to reindex after those changes, of course.