I try to make hibernate search to support both tokenized and untokenized search(pardon me if I use the wrong term here). An example is as following.
I have a list of entities of the following type.
@Entity
@Indexed
@NormalizerDef(name = "lowercase",
filters = {
@TokenFilterDef(factory = ASCIIFoldingFilterFactory.class),
@TokenFilterDef(factory = LowerCaseFilterFactory.class)
}
)
public class Deal {
//other fields omitted for brevity purposes
@Field(store = Store.YES)
@Field(name = "name_Sort", store = Store.YES, normalizer= @Normalizer(definition="lowercase"))
@SortableField(forField = "name_Sort")
@Column(name = "NAME")
private String name = "New Deal";
//Getters/Setters omitted here
}
I also used the keyword method to build the query builder shown as follows. The getSearchableFields method returns a list of searchable fields. In the this example, "name" will be in this returned list as the field name in Deal is searchable.
protected Query inputFilterBuilder() {
return queryBuilder.keyword()
.wildcard().onFields(getSearchableFields())
.matching("*" + searchRequest.getQuery().toLowerCase() + "*").createQuery();
}
This setup works fine when I only use an entire words to search. For example, if I have two Deal entity, one's name is "Practical Concrete Hat" and the other one's name is "Practical Cotton Cheese". When searching by "Practical", I get these two entities back. But when searching by "Practical Co", I get 0 entity back. The reason is because the field name is tokenized and "Practical Co" is not a key word.
My question is how to support both search at the same time so these 2 entities are returned if searching by "Practical" or "Practical Co".
I read through the official hibernate search documentation and my hunch is that I should add one more field that is for untokenized search. Perhaps the way I construct the query builder needs to be updated as well?
Update
Not working solution using SimpleQueryString.
Based on the provided answer, I've written the following query builder logic. However, it doesn't work.
protected Query inputFilterBuilder() {
String[] searchableFields = getSearchableFields();
if(searchableFields.length == 0) {
return queryBuilder.simpleQueryString().onField("").matching("").createQuery();
}
SimpleQueryStringMatchingContext simpleQueryStringMatchingContext = queryBuilder.simpleQueryString().onField(searchableFields[0]);
for(int i = 1; i < searchableFields.length; i++) {
simpleQueryStringMatchingContext = simpleQueryStringMatchingContext.andField(searchableFields[i]);
}
return simpleQueryStringMatchingContext
.matching("\"" + searchRequest.getQuery() + "\"").createQuery();
}
Working solution using separate analyzer for query and phrase queries.
I found from the official documentation that we can use phrase queries to search for more than one word. So I wrote the following query builder method.
protected Query inputFilterBuilder() {
String[] searchableFields = getSearchableFields();
if(searchableFields.length == 0) {
return queryBuilder.phrase().onField("").sentence("").createQuery();
}
PhraseMatchingContext phraseMatchingContext = queryBuilder.phrase().onField(searchableFields[0]);
for(int i = 1; i < searchableFields.length; i++) {
phraseMatchingContext = phraseMatchingContext.andField(searchableFields[i]);
}
return phraseMatchingContext.sentence(searchRequest.getQuery()).createQuery();
}
This does not work for search using more than one word with a space in between. Then I added separate analyzers for indexing and querying as suggested, all of a sudden, it works.
Analyzers definitons:
@AnalyzerDef(name = "edgeNgram", tokenizer = @TokenizerDef(factory = WhitespaceTokenizerFactory.class),
filters = {
@TokenFilterDef(factory = ASCIIFoldingFilterFactory.class),
@TokenFilterDef(factory = LowerCaseFilterFactory.class),
@TokenFilterDef(factory = EdgeNGramFilterFactory.class,
params = {
@Parameter(name = "minGramSize", value = "1"),
@Parameter(name = "maxGramSize", value = "10")
})
})
@AnalyzerDef(name = "edgeNGram_query", tokenizer = @TokenizerDef(factory = WhitespaceTokenizerFactory.class),
filters = {
@TokenFilterDef(factory = ASCIIFoldingFilterFactory.class),
@TokenFilterDef(factory = LowerCaseFilterFactory.class)
})
Annotation for Deal name field:
@Field(store = Store.YES, analyzer = @Analyzer(definition = "edgeNgram"))
@Field(name = "edgeNGram_query", store = Store.YES, analyzer = @Analyzer(definition = "edgeNGram_query"))
@Field(name = "name_Sort", store = Store.YES, normalizer= @Normalizer(definition="lowercase"))
@SortableField(forField = "name_Sort")
@Column(name = "NAME")
private String name = "New Deal";
Code that override name field's analyzer to use the query analyzer
String[] searchableFields = getSearchableFields();
if(searchableFields.length > 0) {
EntityContext entityContext = fullTextEntityManager.getSearchFactory()
.buildQueryBuilder().forEntity(this.getClass().getAnnotation(SearchType.class).clazz()).overridesForField(searchableFields[0], "edgeNGram_query");
for(int i = 1; i < searchableFields.length; i++) {
entityContext.overridesForField(searchableFields[i], "edgeNGram_query");
}
queryBuilder = entityContext.get();
}
Follow up question Why does the above tweak actually works?