Hibernate Full text Serch - order results by relevance

Question

I am trying to make a full text query with Hibernate Search version 5.5.0.Final (I 've already tried with most recent version but doesn't work maybe because of the old version of Hibernate I'm using (5.0.12) ).

The final result that I would like to obtain is the following:

Display at the top of the list the result that matches on the description field with the following logic:
    (Let' assume a user is searching "Milk")
    -Results having the word at the beginning (Milk UHT)
    -Results having the word in second or third position (Chocolate Milk)
    -Results having the word in a phrase(MilkShake)
Then displaying the result matching with the field tags (Lactose free, Gluten Free etc)

This is what I've done so far:

FullTextEntityManager fullTextEntityManager
            = Search.getFullTextEntityManager(entityManager);
    fullTextEntityManager.createIndexer().startAndWait();


    FullTextEntityManager fullTextEntityManager2
            = Search.getFullTextEntityManager(entityManager);

    QueryBuilder queryBuilder = fullTextEntityManager2.getSearchFactory()
            .buildQueryBuilder()
            .forEntity(ProductEntity.class)
            .get();


    Query myQuery = queryBuilder
            .bool()
            .should(queryBuilder.keyword()
                    .onField("description").boostedTo(9l).matching(query)
                    .createQuery())
            .should(queryBuilder.phrase()
                    .onField("description").boostedTo(5l).sentence(query)
                    .createQuery())

            .should(queryBuilder.keyword()
                    .onField("tags").boostedTo(3l).matching(query)
                    .createQuery())
            .should(queryBuilder.phrase()
                    .onField("tags").boostedTo(1l).sentence(query)
                    .createQuery())

            .createQuery();


    org.hibernate.search.jpa.FullTextQuery jpaQuery
            = fullTextEntityManager.createFullTextQuery(myQuery, ProductEntity.class);

    return jpaQuery.getResultList();

I've been reading a lot on the internet but still I cannot get the desired result. Is this even possible? Can you give me a hint?

Thanks in advance

yrodiere · Answer 1 · 2020-06-30T12:38:26.897

First, know that the boost is not a constant weight assigned to each query; rather, it's a multiplier. So when you set the boost to 1 on query #4 and to 3 on query #3, it's theoretically possible that query #4 ends up with a higher "boosted score" if its base score is more than three times that of query #3. To avoid that kind of problem, you can mark the score of each query as constant (use .boostedTo(3l).withConstantScore().onField("tags") instead of .onField("tags").boostedTo(3l).

Second, the phrase query is not what you think it is. The phrase query accepts a multi-term input string, and will look for documents that contain these terms in the same order. Since you passed a single term, it's pointless. So you need something else.

Query 1: Results having the word at the beginning

I believe the only way to do exactly what you want are span queries. However, they are not part of the Hibernate Search DSL, so you'll have to rely on low-level Lucene APIs. What's more, I've never used them, and I'm not sure how they are supposed to be used... What little I know was taken from Elasticsearch's documentation, but the Lucene documentation is severely lacking.

You can try something like this, but if it doesn't work you'll have to debug it yourself (I don't know more than you do):

    QueryBuilder queryBuilder = fullTextEntityManager2.getSearchFactory()
            .buildQueryBuilder()
            .forEntity(ProductEntity.class)
            .get();
    Analyzer analyzer = fullTextEntityManager.getSearchFactory()
            .getAnalyzer(ProductEntity.class);

    Query myQuery = queryBuilder
            .bool()
            .should(new BoostQuery(new ConstantScoreQuery(createSpanQuery(qb, "description", query, analyzer)), 9L))
            [... add other clauses here...]
            .createQuery();

// Other methods (to be added to the same class)

    private static Query createSpanQuery(QueryBuilder qb, String fieldName, String searchTerms, Analyzer analyzer) {
        BooleanJunction bool = qb.bool();
        List<String> terms = analyze(fieldName, searchTerms, analyzer);
       for (int i = 0; i < terms.size(); ++i) {
            bool.must(new SpanPositionRangeQuery(new SpanTermQuery(new Term( fieldName, terms.get(i))), i, i);
        }
        return bool.createQuery();
    }

    private static List<String> analyze(String fieldName, String searchTerms, Analyzer analyzer) {
        List<String> terms = new ArrayList<String>();
        try {
            final Reader reader = new StringReader( searchTerms );
            final TokenStream stream = analyzer.tokenStream( fieldName, reader );
            try {
                CharTermAttribute attribute = stream.addAttribute( CharTermAttribute.class );
                stream.reset();
                while ( stream.incrementToken() ) {
                    if ( attribute.length() > 0 ) {
                        String term = new String( attribute.buffer(), 0, attribute.length() );
                        terms.add( term );
                    }
                }
                stream.end();
            }
            finally {
                stream.close();
            }
        }
        catch (IOException e) {
            throw new IllegalStateException( "Unexpected exception while analyzing search terms", e );
        }
        return terms;
    }

Query 2: Results having the word in second or third position

I believe you can use the same code as for query 1, but adding an offset. If the actual position doesn't matter, and you'll accept words in fourth or fifth position, you can simply do this:

queryBuilder.keyword().boostedTo(5l).withConstantScore()
        .onField("description").matching(query)
       .createQuery()

Query 3: Results having the word in a phrase(MilkShake)

From what I understand, you mean "results containing a word that contains the search term".

You could use wilcard queries for that, but unfortunately these queries do not apply analyzers, resulting in case-sensitive search (among other problems).

Your best bet is probably to define a separate field for this query, e.g. description_ngram, and assign a specially-crafted analyzer to it, one which uses the ngram tokenizer when indexing. The ngram tokenizer simply takes an input string and transforms it to all its substrings: "milkshake" would become ["m", "mi", "mil", "milk", ..., "milkshake", "i", "il", "ilk", "ilks", "ilksh", ... "ilkshake", "l", ... "lkshake", ..., "ke", "e"]. Obviously it takes a lot of disk space, but it can work for small-ish datasets. You will find instructions for a similar use case here. The answer mentions a different analyzer, "edgengram", but in your case you'll really want to use the "ngram" analyzer.

Alternatively, if you're sure the indexed text is correctly formatted to clearly separate components of a "composite" word (e.g. "milk-shake", "MilkShake", ...), you can simply create a field (e.g. description_worddelimiterfilter) that uses an analyzer with a word-delimiter filter (see org.apache.lucene.analysis.miscellaneous.WordDelimiterFilter) which will split these composite words. Then you can simply query like this:

queryBuilder.keyword().boostedTo(3l).withConstantScore()
        .onField("description_worddelimiterfilter")
        .matching(query)
        .createQuery()

Hi, thanks for your reply. I thought it was easier to implement my idea. After all it's a basic search. I'm trying to implement what you have suggested. — user3187960, Jun 30 '20 at 11:19
I have the following errors: Query myQuery = queryBuilder .bool() .should(new BoostQuery(new ConstantScoreQuery(createSpanQuery(queryBuilder, "description", query, analyzer)), 9L)); Required: org.apache.lucene.search.Query Found: org.hibernate.search.query.dsl.BooleanJunction <> /*******************************************************************************/ final PagedBytes.Reader reader = new StringReader(searchTerms); Required: org.apache.lucene.util.PagedBytes.Reader Found: java.io.StringReader — user3187960, Jun 30 '20 at 11:22
You need to add your other clauses and end the statement with `.createQuery()`. I edited my answer to clarify that. — yrodiere, Jun 30 '20 at 12:40
Also, while it's a relatively basic search, you're also attempting to do things that are quite unusual. If you can, I'd suggest you challenge your requirements and try to find a less exotic definition of your search query. The bits about requiring a specific position for the terms, in particular, are quite unusual (I've never seen this kind of requirement). — yrodiere, Jun 30 '20 at 12:41
The only thing I am trying to do is get result for relevance... If i search for "Olive Oil" i would like to have an Olive oil bottle appear before " Tuna with olive oil ", or " Pasta BrandName " all the article of that brand to be at the top of the result...At the moment this is not appening — user3187960, Jun 30 '20 at 12:55
Sure. If you want this, the span queries should do the trick... to a point. "Bottle of extra-virgin olive oil" may still appear after "Tuna with olive oil", for example. Scoring is great at pushing the best results to the first page of results, but fine-tuning the order of results within that page will be more difficult. Did the span query help, at least? — yrodiere, Jun 30 '20 at 16:16
Your error is caused by an invalid import. Pick the right `Reader` class in your imports. — yrodiere, Jul 01 '20 at 09:29

Hibernate Full text Serch - order results by relevance

1 Answers1

Query 1: Results having the word at the beginning

Query 2: Results having the word in second or third position

Query 3: Results having the word in a phrase(MilkShake)

Linked