1

Its my first time dealing with optimized search functionality, and part of my proficiency is on the front end of android development, but I'm willing to take the adventure of hibernate-search. I do understand the functionality of SQL "LIKE" query, what it does and its limitation, thats the reason why I jumped straight ahead to hibernate-search (lucene), my goal is to have an auto suggestion based on inputs(input queries). This is what I got so far

@Indexed
@Table (name = "shop_table")
@Entity
@AnalyzerDef(name = "myanalyzer",
    tokenizer = @TokenizerDef(factory = KeywordTokenizerFactory.class), //
    filters = { //
            @TokenFilterDef(factory = LowerCaseFilterFactory.class),
            @TokenFilterDef(factory = WordDelimiterFilterFactory.class),
            @TokenFilterDef(factory = EdgeNGramFilterFactory.class, params = 
{ @Parameter(name = "maxGramSize", value = "1024") }),})
@Analyzer(definition = "myanalyzer")
public class Shop implements Serializable {

@Id
@GeneratedValue(strategy = GenerationType.IDENTITY)
private Long id;
enter code here

@Field(index = Index.YES, store = Store.YES, analyze = Analyze.YES)
@Column(name = "name")
private String name;

... other methods

My query

 Query lucenQuery = qb.keyword().onField("name").matching(searchTerm).createQuery();

Its just a basic query, and I focus solely on the analyzer configuration to get what I want, its really confusing which part should I focus on to achieve what I want, the Tokenizing? the Filtering? or the Query itself? anyway I have these 2 phrases already indexed.

"Apache Lychee Department" 
"Apache Strawberry Club Large"

When I process/query "Straw" it gives me the Apache Strawberry Club Large but when I process/query "Lychee" or "Apache Lychee" the query gives me both? Im only expecting Apache Lychee Department

The way I understand all my configuration is/are

EdgeNGramFilterFactory (1024) will give me a series of 1,024 index of EdgeNGrams

LowerCaseFilterFactory will give me all lower-cased indeces

WordDelimiterFilterFactory filter it by making the query as one word, and give me the matching data.

and every entry/data will be tokenized as a Keyword by KeywordTokenizerFactory and will be indexed 1,024 by EdgeNGram

I tried to query a phrase, but still getting the same output

  Query luceneQuery = qb.phrase().onField("name").sentence(searchTerm).createQuery();

My Goal is to have an auto-suggestion.. or atleast start with mimicking "LIKE" of sql..

2 Answers2

1

There are two things you should take into account:

  • By default, when there are multiple terms in a query, the results will include documents that match any of the terms, not all of the terms.
  • By default, your queries will be analyzed using the same analyzer you used when indexing.

This means in particular that your query "Lychee" will be analyzed to something like "L Ly Lyc Lych Lyche Lychee" (because of the edge ngram filter). The string "Apache Strawberry Club Large" was previously analyzed and the term "Large" was expanded to "L La Lar Larg Large" because of the edge ngram filter. So the query "Lychee" will match "Apache Strawberry Club Large", simply because they both contain a word that starts with L...

That's obviously undesired behavior.

The first step would be to change the way your query is analyzed, so that you don't end up matching completely unrelated documents. Basically you will need to define another analyzer which is almost identical, but does not have the "edge ngram" filter. Then you will need to tell Hibernate Search to use that analyzer to analyze your query.

See this answer for a detailed explanation.

As a second step, you need to make your query match if all the included terms are present in a document. To that end, the easiest solution would be to use the simple query string query instead of the keyword query.

Replace this:

Query lucenQuery = qb.keyword().onField("name").matching(searchTerm).createQuery();

With this:

Query lucenQuery = qb.simpleQueryString().onField("name").withAndAsDefaultOperator().matching(searchTerm).createQuery();

The key being the call to .withAndAsDefaultOperator().

This change will have several other effects, such as enabling special syntax in the input string, so I'd encourage your to read the reference documentation to know what simpleQueryString is about exactly.

yrodiere
  • 9,280
  • 1
  • 13
  • 35
  • Sorry for the late response, Thank you so much for this very informative assistance, I'm atleast on the right direction. Thank you – Nestor Briaga Mar 11 '19 at 02:39
  • And also, I didnt realize that NGram will create Index for each word, i though it will only index per entry.. didnt expect that for every white space, it will create an index.. – Nestor Briaga Mar 11 '19 at 03:24
  • Hello again, I made it work, i checked the link you gave me and I didnt know you can create a separate analyzer apart from creating a first analyzer, but would you mind checking the code I did, if Im making the approach right? – Nestor Briaga Mar 11 '19 at 05:08
0

I made it work by this, thanks to @yrodiere

@Indexed
@Table (name = "shop_table")
@Entity
@AnalyzerDef(name = "edgeNgram",
    tokenizer = @TokenizerDef(factory = WhitespaceTokenizerFactory.class),
    filters = {
            @TokenFilterDef(factory = LowerCaseFilterFactory.class),
            @TokenFilterDef(factory = EdgeNGramFilterFactory.class, params = 
                                      { @Parameter(name = "maxGramSize", value = "1024") }),
    })
@AnalyzerDef(name = "search_query_analyzer",
    tokenizer = @TokenizerDef(factory = WhitespaceTokenizerFactory.class),
    filters = {
            @TokenFilterDef(factory = ASCIIFoldingFilterFactory.class),
            @TokenFilterDef(factory = LowerCaseFilterFactory.class)
    })
public class Shop implements Serializable {

@Id
@GeneratedValue(strategy = GenerationType.IDENTITY)
private Long id;


@Field(store = Store.YES, analyze = Analyze.YES)
@Column(name = "name")
@Analyzer(definition = "edgeNgram")
private String name;

public void setName(String name) {
    this.name = name;
}

public String getName() {
    return this.name;
}
}

and my query

  QueryBuilder qb = fullTextEntityManager.getSearchFactory().buildQueryBuilder().forEntity(Shop.class)
            .overridesForField("name", "search_query_analyzer").get();

    Query lucenQuery = qb.simpleQueryString().onField("name").withAndAsDefaultOperator().matching(shopSearchTerm).createQuery();

But im not sure if Im implementing it on proper approach..

  • You should use `ASCIIFoldingFilterFactory` for both analyzers, or none of them. But apart from that it looks good. – yrodiere Mar 11 '19 at 10:39
  • Thanks @yrodiere, will do as you suggest, Ill just dig more on what does the other Factory does.. thank you for the very much assistance. This does everything I need for the mobile app Im creating, just a little bit cumbersome I have to do this and the backend as well.. thanks very much! – Nestor Briaga Mar 12 '19 at 02:32