lucene - search with contains value

Question

I have a field socialReason in my database with the folowing values

ch fleyriat
CLINIQUE DENTAIRE MUTUALISTE
CENTRE DE SOINS INFIRMIERS BETSCHDORF

for example, i want when i search with the word CH i get the values that contains CH, in my cas i want to get ch fleyriat and CENTRE DE SOINS INFIRMIERS BETSCHDORF

I tried with bolow code, but it return nothing

  @Field(analyzer = @Analyzer(definition = "test"))
  private String socialReason;

  public class CustomAnalyzerProvider implements LuceneAnalysisDefinitionProvider {
    @Override
    public void register(LuceneAnalysisDefinitionRegistryBuilder builder) {
        builder
        .analyzer( "test" )
                .tokenizer( KeywordTokenizerFactory.class )
                .tokenFilter( ASCIIFoldingFilterFactory.class )
                .tokenFilter( LowerCaseFilterFactory.class );
    }
  } 

  fullTextEntityManager.getSearchFactory().buildQueryBuilder().forEntity(ExerciseFrameworkEntity.class)
 .overridesForField("socialReason", "test").get();

  listOfQuery.add(getQueryBuilder().keyword().onField("socialReason").matching(socialReason).createQuery());

score 0 · Answer 1 · answered Feb 02 '22 at 15:52

You're after the ngram token filter, a filter that will generate a list of all substrings of each word in your index.

As you can imagine, this will generate a lot of data and thus your index will be very large. Only do this for reasonably small datasets.

See the analyzer configuration mentioned in this question, and see also the answer to that question to query this field correctly.

If you can, I would also recommend reconsidering your requirements: it's generally enough to match words that begin with what the user typed (in your example, word that begin with "ch"), and that can be implemented with a much lower overhead thanks to the edgeNgram tokenFilter. To do that, see this other answer.

Sorry i made a typo in my question, i replace `overridesForField("socialReason", "edgeNgram")` by `verridesForField("socialReason", "test")`. I don't want the words that begin with "ch", i want the words that contains "ch". According to my understanding the `EdgeNGramFilterFactory, minGramSize="1", maxGramSize="10"` the analyzer transforms the word "ch fleyriat" to "c", "ch", "f","fl","fle","fley","fleyr","fleyri",fleyria","fleyriat", and the word "BETSCHDORF" to "B","BE","BET","BETS","BETSC","BETSCH"..., in this case when the user types "CH", it returns only "ch fleyriat". — Aymen Kanzari, Feb 02 '22 at 18:45

lucene - search with contains value

1 Answers1