Hibernate Search | ngram analyzer with minGramSize 1

Question

I have some problems with my Hibernate Search analyzer configuration. One of my indexed entities ("Hospital") has a String field ("name") that could contain values with lengths from 1-40. I want to be able to find a entity by searching for just one character (because it could be possible, that a hospital has single character name).

@Indexed(index = "HospitalIndex")
@AnalyzerDef(name = "ngram",
        tokenizer = @TokenizerDef(factory = StandardTokenizerFactory.class),
        filters = {
                @TokenFilterDef(factory = StandardFilterFactory.class),
                @TokenFilterDef(factory = LowerCaseFilterFactory.class),
                @TokenFilterDef(factory = NGramFilterFactory.class,
                        params = {
                                @Parameter(name = "minGramSize", value = "1"),
                                @Parameter(name = "maxGramSize", value = "40")})
        })
public class Hospital {

        @Field(index = Index.YES, analyze = Analyze.YES, store = Store.NO, analyzer = @Analyzer(definition = "ngram"))
        private String name = "";
}

If I add a hospital with name "My Test Hospital" the Lucene index looks like this:

1   name    al
1   name    e
1   name    es
1   name    est
1   name    h
1   name    ho
1   name    hos
1   name    hosp
1   name    hospi
1   name    hospit
1   name    hospita
1   name    hospital
1   name    i
1   name    it
1   name    ita
1   name    ital
1   name    l
1   name    m
1   name    my
1   name    o
1   name    os
1   name    osp
1   name    ospi
1   name    ospit
1   name    ospita
1   name    ospital
1   name    p
1   name    pi
1   name    pit
1   name    pita
1   name    pital
1   name    s
1   name    sp
1   name    spi
1   name    spit
1   name    spita
1   name    spital
1   name    st
1   name    t
1   name    ta
1   name    tal
1   name    te
1   name    tes
1   name    test
1   name    y
1   name    a

This is how I build and execute my search query:

QueryBuilder hospitalQb = fullTextEntityManager.getSearchFactory().buildQueryBuilder().forEntity(Hospital.class).get();
Query hospitalQuery = hospitalQb.keyword().onFields("name")().matching(searchString).createQuery();
javax.persistence.Query persistenceQuery = fullTextEntityManager.createFullTextQuery(hospitalQuery, Hospital.class);
List<Hospital> results = persistenceQuery.getResultList();

The problem is that the same ngram analyzer is also used for my search query. So when I am search for example for "hospital" I will find all hospitals that contains a "a"-character in the name. This is how the search query looks likes, when I call the toString method on it:

name:h name:ho name:hos name:hosp name:hospi name:hospit name:hospita name:hospital name:o name:os name:osp name:ospi name:ospit name:ospita name:ospital name:s name:sp name:spi name:spit name:spita name:spital name:p name:pi name:pit name:pita name:pital name:i name:it name:ita name:ital name:t name:ta name:tal name:a name:al name:l

So the question is, does anybody know a better analyzer configuration or another way build the search query that solves the problem?

The answer from Yoann is correct Adding a couple of suggestions: Don't use such a large `maxGramSize`: for most use cases pick 3 or 4. Also you might want to index the same field with multiple @Field annotations: give each a different name and a different analyzer, then when you query it you perform a boolean query targeting both fields, each with its right analyzer. — Sanne, Mar 28 '17 at 09:51

yrodiere · Accepted Answer · 2022-03-22T09:38:50.587

5

Updated answer for Hibernate Search 6

With Hibernate Search 6, you can define a second analyzer, identical to your "ngram" analyzer except that it does not have an ngram filter, and assign it as the searchAnalyzer for your field:

public class Hospital {
        // ...

        @FullTextField(analyzer = "ngram",
                searchAnalyzer = "my_analyzer_without_ngrams")
        private String name = "";

        // ...
}

Then Hibernate Search will automatically use the "ngram" analyzer when indexing, but "my_analyzer_without_ngrams" when searching, which will lead to the expected behavior.

Additionally, if you are implementing some kind of auto-completion (foo*), and not in-word search (*foo*), you may want to use EdgeNGramFilterFactory instead of NGramFilterFactory: it will only generate ngrams that are prefixes of the indexed tokens.

Original answer for Hibernate Search 5

You can set up a second analyzer, identical to your "ngram" analyzer except that it does not have an ngram filter, and then override the analyzer used for queries:

QueryBuilder hospitalQb = fullTextEntityManager.getSearchFactory().buildQueryBuilder().forEntity(Hospital.class)
    .overridesForField( "name", "my_analyzer_without_ngrams" )
    .get();
// Then it's business as usual

Additionally, if you are implementing some kind of auto-completion (foo*), and not in-word search (*foo*), you may want to use EdgeNGramFilterFactory instead of NGramFilterFactory: it will only generate ngrams that are prefixes of the indexed tokens.

edited Mar 22 '22 at 09:38

answered Mar 27 '17 at 13:07

yrodiere

9,280
1
13
35

Thanks for your help. That nearly solves the problem, but is there a possibility to override the analyzer for all fields? I have many embedded indexed entities with the same issue. So I would have to override them all ( .overridesForField( "careUnits.name" ....) Maybe it is possible to load an instance of the "my_analyzer_without_ngrams" programmatically and build the search query with this instance? – André Mar 29 '17 at 13:07
@Andre Could you provide the actual code you are using? I only see one field in your original question, so I don't see what the problem is exactly, and the solution may differ depending on the nature of the problem. Do you build one query targeting multiple fields? Multiple queries, each targeting a single field? Something else? – yrodiere Mar 29 '17 at 15:33
In the original question I tried to break down the complexity. I have more the one indexed entity in the real implementation. This is how I implemented the search in the moment: [PastBin](https://pastebin.com/itx1Nh9E). It works this way, but it seems to me that all the manual overrides for all fields are a bit dirty. So if you know a better way to solve this, I would be happy about your solution. Thanks for your help and your time. – André Apr 03 '17 at 10:42
Ok, judging from what you are doing, you'll probably be better off simply parsing the search string using a `org.apache.lucene.queryparser.simple.SimpleQueryParser`. Just use the `SimpleQueryParser(Analyzer analyzer, Map weights)` constructor, and retrieve your analyzer by executing `fullTextSession.getSearchFactory().getAnalyzer("search")`. Note that we are in the process of [adding support for such parsing to the QueryBuilder](https://github.com/hibernate/hibernate-search/pull/1318), but this won't be available before 5.8, in a few weeks at best. – yrodiere Apr 04 '17 at 12:44
@yrodiere I am interested in this answer, but can you explain how the override works in more detail? perhaps with an example? – Al Grant Apr 24 '19 at 21:25

Hibernate Search | ngram analyzer with minGramSize 1

1 Answers1

Updated answer for Hibernate Search 6

Original answer for Hibernate Search 5

Linked