Use a custom analyzer for the matching term with Hibernate Search

Question

I have a field which has a custom analyzer.

@Analyzer(definition = "edgeNgram")
@Field(index = Index.YES, analyze = Analyze.YES, store = Store.YES)
@Lob
String value;

Here is the analyzer on my class.

@AnalyzerDef(name = "edgeNgram",
        tokenizer = @TokenizerDef(factory = WhitespaceTokenizerFactory.class),
        filters = {
                @TokenFilterDef(factory = ASCIIFoldingFilterFactory.class), // Replace accented characters by their simpler counterpart (è => e, etc.)
                @TokenFilterDef(factory = LowerCaseFilterFactory.class), // Lowercase all characters
                @TokenFilterDef(
                        factory = EdgeNGramFilterFactory.class, // Generate prefix tokens
                        params = {
                                @org.hibernate.search.annotations.Parameter(name = "minGramSize", value = "4"),
                                @org.hibernate.search.annotations.Parameter(name = "maxGramSize", value = "10")
                        }
                )
        })

And here I create my query.

query = queryBuilder
        .simpleQueryString()
        .boostedTo(3f) // This whole query is boosted so exact matches will obtain a better score
        .onFields("title.value", "keyword.values.value")
        .boostedTo(2f)
        .andField("description.values.value")
        //.withAndAsDefaultOperator()
        .matching(Arrays.stream(searchTerm.split(" ")).map(e -> e + "*").collect(Collectors.joining(" ")).toLowerCase())
        .createQuery();

I don't know how (and couldn't find in the docs of Hibernate Search) to set an Analyzer for the searching term searchTerm. Basically I started splitting manually and setting it to lower case in Java. But that doesn't seem right.

What I want is to apply another analyzer to my query term such as:

@AnalyzerDef(name = "edgeNGram_query",
        tokenizer = @TokenizerDef(factory = WhitespaceTokenizerFactory.class),
        filters = {
                @TokenFilterDef(factory = ASCIIFoldingFilterFactory.class), // Replace accented characeters by their simpler counterpart (è => e, etc.)
                @TokenFilterDef(factory = LowerCaseFilterFactory.class) // Lowercase all characters
        })

Do you know how to set a custom analyzer for the query term and why is it not applied by default? If I search "bouees" it works but if I search "bouées" it doesn't.

Thanks!

SOLUTION:

My issue was that I was making a simpleQueryString, when I should have been doing a keyword query. The simpleQueryString doesn't seem to run the analyzer on the search term! Then I just had to follow @yrodiere .overridesForField( "description.values.value", "edgeNGram_query" ) to use the right search term analyzer.

score 1 · Accepted Answer · answered Aug 20 '20 at 12:09

In Hibernate Search 5, you have to call overridesForField when creating the query builder, to override the analyzer for each field:

QueryBuilder queryBuilder = fullTextEntityManager.getSearchFactory().buildQueryBuilder().forEntity(Hospital.class)
    .overridesForField( "title.value", "edgeNGram_query" )
    .overridesForField( "keyword.values.value", "edgeNGram_query" )
    .overridesForField( "description.values.value" )
    .get();

// Then it's business as usual
Query query = queryBuilder
        .simpleQueryString()
        .boostedTo(3f) // This whole query is boosted so exact matches will obtain a better score
        .onFields("title.value", "keyword.values.value")
        .boostedTo(2f)
        .andField("description.values.value")
        //.withAndAsDefaultOperator()
        .matching(searchTerm)
        .createQuery();

See also the end of this answer, which is probably where you got your code from in the first place? :)

If one day you upgrade to Hibernate Search 6 (in Beta, different APIs), you'll find it's much simpler: there is an option to override the analyzer when building your predicate. For example:

List<MyEntity> hits = searchSession.search( MyEntity.class )
        .where( f -> f.simpleQueryString()
                .fields( "title.value", "keyword.values.value" ).boost( 3f )
                .fields( "description.values.value" )
                .matching( searchTerm )
                 //.defaultOperator( BooleanOperator.AND )
                .analyzer( "edgeNGram_query" ) ) // <= HERE
        .fetchHits( 20 );

Thank you! Does this mean that in any case, the same analyzer will be applied to the field and the search term? When I don't call `overridesForField`, the `edgeNgram` analyzer would be applied to my search term? If that is the case I don't understand why "bouées" doesn't work and "bouees" works, even though my analyzer includes `ASCIIFoldingFilterFactory`. — Elbbard, Aug 20 '20 at 12:20
My issue was that I was making a `simpleQueryString`, when I should have been doing a `keyword` query. The `simpleQueryString` doesn't seem to run the analyzer on the search term! — Elbbard, Aug 20 '20 at 12:42
`simpleQueryString` does perform analysis, unless you ask it not to. The problem was probably something else. Couldn't tell you what it was, though... — yrodiere, Aug 21 '20 at 12:10

Use a custom analyzer for the matching term with Hibernate Search

1 Answers1