0

I'm trying to implement a full text search functionality through hibernate search. We need to search names, address etc. A user can search like a name "John", "Johm Murphy", "Mark", "Mark L Thomas" and addresses too like "20601 Blvd", "first floor" and so on.

Though the current logic works for few words with more than 2 characters like "John" is searchable but not the "Mark", if I say "Ma" then I have results but If I write Mar or Mark, it does not gives any record. Am also able to search with city as Columbia.

Also multi word search is not working.

When I am not using any analyzer(as in the current below code) than the above statements are valid, if I'm using edgengram, text, standard analyzers then I've different outputs. But none of the analyzer works. Below is the full code:

Index structure from which I'm trying to retrieve the data:

  > {
>         "_index" : "client_master_index_0300",
>         "_type" : "com.csc.pt.svc.data.to.Basclt0300TO",
>         "_id" : "518,1",
>         "_score" : 4.0615783,
>         "_source" : {
>           "id" : "518,1",
>           "cltseqnum" : 518,
>           "addrseqnum" : "1",
>           "addrln1" : "Dba",
>           "addrln2" : "Betsy Evans",
>           "city" : "SDA",
>           "state" : "SC",
>           "zipcode" : "89756-4531",
>           "country" : "USA",
>           "basclt0100to" : {
>             "cltseqnum" : 518,
>             "clientname" : "Betsy Evans",
>             "longname" : "Betsy Evans",
>             "id" : "518"
>           },
>           "basclt0900to" : {
>             "cltseqnum" : 518,
>             "id" : "518"
>           }
>         }
>       }

Index definition for the same index:

    {
>   "client_master_index_0300" : {
>     "aliases" : { },
>     "mappings" : {
>       "com.csc.pt.svc.data.to.Basclt0300TO" : {
>         "dynamic" : "strict",
>         "properties" : {
>           "addrln1" : {
>             "type" : "text",
>             "store" : true
>           },
>           "addrln2" : {
>             "type" : "text",
>             "store" : true
>           },
>           "addrln3" : {
>             "type" : "text",
>             "store" : true
>           },
>           "addrseqnum" : {
>             "type" : "text",
>             "store" : true
>           },
>           "basclt0100to" : {
>             "properties" : {
>               "clientname" : {
>                 "type" : "text",
>                 "store" : true
>               },
>               "cltseqnum" : {
>                 "type" : "long",
>                 "store" : true
>               },
>               "firstname" : {
>                 "type" : "text",
>                 "store" : true
>               },
>               "id" : {
>                 "type" : "keyword",
>                 "store" : true,
>                 "norms" : true
>               },
>               "longname" : {
>                 "type" : "text",
>                 "store" : true
>               },
>               "midname" : {
>                 "type" : "text",
>                 "store" : true
>               }
>             }
>           },
>           "basclt0900to" : {
>             "properties" : {
>               "cltseqnum" : {
>                 "type" : "long",
>                 "store" : true
>               },
>               "email1" : {
>                 "type" : "text",
>                 "store" : true
>               },
>               "id" : {
>                 "type" : "keyword",
>                 "store" : true,
>                 "norms" : true
>               }
>             }
>           },
>           "city" : {
>             "type" : "text",
>             "store" : true
>           },
>           "cltseqnum" : {
>             "type" : "long",
>             "store" : true
>           },
>           "country" : {
>             "type" : "text",
>             "store" : true
>           },
>           "id" : {
>             "type" : "keyword",
>             "store" : true
>           },
>           "state" : {
>             "type" : "text",
>             "store" : true
>           },
>           "zipcode" : {
>             "type" : "text",
>             "store" : true
>           }
>         }
>       }
>     },
>     "settings" : {
>       "index" : {
>         "creation_date" : "1535607176216",
>         "number_of_shards" : "5",
>         "number_of_replicas" : "1",
>         "uuid" : "x4R71LNCTBSyO9Taf8siOw",
>         "version" : {
>           "created" : "6030299"
>         },
>         "provided_name" : "client_master_index_0300"
>       }
>     }
>   }
> }

The java objects containing the fields:

    @Field(name = "longname", index = Index.YES, store = Store.YES,
            analyze = Analyze.YES)
    private String longname = "";

@Field(name = "firstname", index = Index.YES, store = Store.YES,
    analyze = Analyze.YES)
    private String firstname = "";

Further, currently I'm using the wildcard context query:

    public synchronized void searchClienData() {
   String lowerCasedSearchTerm = this.data.getSearchText().toLowerCase();

    SearchFactory searchFactory = fullTextSession.getSearchFactory();
    QueryBuilder buildQuery = searchFactory.buildQueryBuilder().forEntity(Basclt0300TO.class).get();

    String[] projections = {"basclt0100to.longname", "basclt0100to.cltseqnum", "addrln1", "addrln2", 
            "city","state","zipcode", "country","basclt0900to.email1" };

     Query query = queryBuilder.keyword()
    .onField("basclt0100to.longname").andField("addrln1").andField("addrln2")
    .andField("city").andField("state").andField("country").matching(lowerCasedSearchTerm)
    .createQuery();

    FullTextQuery fullTextQuery = fullTextSession.createFullTextQuery(query, Basclt0300TO.class);
    fullTextQuery.setMaxResults(this.data.getPageSize()).setFirstResult(this.data.getPageSize());

    List<String> projectedFields = new ArrayList<String>();
    for (String fieldName : projections)
            projectedFields.add(fieldName);

    @SuppressWarnings("unchecked")
    List<Cltj001ElasticSearchResponseTO> results = fullTextQuery.
    setProjection(projectedFields.toArray(new String[projectedFields.size()]))
    .setResultTransformer( new BasicTransformerAdapter() {
        private static final long serialVersionUID = 1L;
        @Override
        public Cltj001ElasticSearchResponseTO transformTuple(Object[] tuple, String[] aliases) {
            return   new Cltj001ElasticSearchResponseTO((String) tuple[0], (long) tuple[1],
                        (String) tuple[2], (String) tuple[3], (String) tuple[4],
                        (String) tuple[5],(String) tuple[6], (String) tuple[7], (String) tuple[8]);

        }
    })
    .getResultList();
    resultsClt0300MasterIndexList = results;

}
ronak
  • 49
  • 11

1 Answers1

0

First, you need to actually assign your analyzer definitions to your fields. Just defining the analyzers is not enough.

@Field(name = "longname", index = Index.YES, store = Store.YES,
        analyze = Analyze.YES, analyzer = @Analyzer(definition = "theNameOfSomeAnalyzerDefinition"))
private String longname = "";

@Field(name = "firstname", index = Index.YES, store = Store.YES,
    analyze = Analyze.YES, analyzer = @Analyzer(definition = "theNameOfSomeAnalyzerDefinition"))
private String firstname = "";

Then, you need to pick a strategy and stick to it:

  • either you use wildcard queries, which are easy to use and don't require EdgeNGram token filters, but tend to cause problems due to the query terms not being analyzed
  • or you apply EdgeNGram token filters to your fields, and at query time:
    • use a keyword query without the wildcard option
    • and override the analyzers to use different ones, which should have the same definition as the analyzers assigned to your fields, except they should not use an EdgeNGram token filter.

But do not mix the two approaches. Never. It just won't work.

yrodiere
  • 9,280
  • 1
  • 13
  • 35
  • if I understand you correctly. 1. I need to use EdgeNGram filters I already had in my code on the fields. 2. I need to use simplle keyword queries. But will they work on multi words like "John L Murphy" and "Neha Micheal". – ronak Aug 29 '18 at 13:12
  • just one more query. How can I override multiple fields. As I need to search on multiple fields. – ronak Aug 29 '18 at 13:20
  • 1. Yes. 2. Yes, and yes they will work with multiple words if your analyzer performs tokenization (uses "WhitespaceTokenizerFactory" and not "KeywordTokenizerFactory"). "keyword()" is a bad name for the DSL method, I know. 3. As I said, `my_analyzer_without_ngrams` should be defined exactly as the analyzer applied to the field, but without the `EdgeNGram` token filter. – yrodiere Aug 29 '18 at 13:20
  • Override multiple field by calling the `overrideForField()` method multiple times. – yrodiere Aug 29 '18 at 13:21
  • I've implemented as suggested, but now I'm not getting any results for my simple keyword query. `Query query = queryBuilder.keyword() .onFields("basclt0100to.longname", "basclt0100to.firstname", "addrln1", "addrln2" ,"city","state","zipcode", "country", "basclt0900to.email1") .matching(lowerCasedSearchTerm) .createQuery();` Am I doing something wrong here. Please advise. – ronak Aug 29 '18 at 15:29
  • In addition to above comment, juts wanna clarify that I'm getting data for some strings and not for others. THough query fires on same column of index. – ronak Aug 29 '18 at 15:37
  • Sorry I may be bugging you. But its weird. The logic works for words like "John", "Akash". even if the these are written after space as second word. But is not woring for words like "Betsy", "Gopal". Not sure of what's wrong. – ronak Aug 29 '18 at 15:58
  • You need to be more specific. When you say "it doesn't work with 'Betsy'", I really don't have enough information to help you. Please update your question with a separate section containing the full, up-to-date code: the analyzer definitions, the entity code, the query code, the documents you tried to match and the exact input to your search method that did not return the expected results, and an accurate description of those unexpected results (which documents, which content). Also please stop copying my answers to your question, it just makes it all harder to understand. – yrodiere Aug 30 '18 at 06:43
  • Updated the question with my current query. – ronak Aug 30 '18 at 07:16
  • You didn't override analyzers using `searchFactory.buildQueryBuilder().forEntity(Basclt0300TO.class).overrideForField( ... ).get`. You must do it, I didn't mention it as a some optional requirement. – yrodiere Aug 30 '18 at 07:47