1

I want to perform substring/partial word match using elastic search. I want results to be returned in the perticular order. In order to explain my problem I will show you how I create my index, mappings and what are the records I use.

Creating Index and mappings:

PUT /my_index1
{
    "settings": {
        "analysis": {
            "filter": {
                "trigrams_filter": {
                    "type":     "ngram",
                    "min_gram": 3,
                    "max_gram": 3
                }
            },
            "analyzer": {
                "trigrams": {
                    "type":      "custom",
                    "tokenizer": "standard",
                    "filter":   [
                        "lowercase",
                        "trigrams_filter"
                    ]
                }
            }
        }
    },
    "mappings": {
        "my_type1": {
            "properties": {
                "text": {
                    "type":     "string",
                    "analyzer": "trigrams" 
                }
            }
        }
    }
}

Bulk record insert:

POST /my_index1/my_type1/_bulk
{ "index": { "_id": 1 }}
{ "text": "men's shaver" }
{ "index": { "_id": 2 }}
{ "text": "men's foil shaver" }
{ "index": { "_id": 3 }}
{ "text": "men's foil advanced shaver" }
{ "index": { "_id": 4 }}
{ "text": "norelco men's foil advanced shaver" }
{ "index": { "_id": 5 }}
{ "text": "men's shavers" }
{ "index": { "_id": 6 }}
{ "text": "women's shaver" }
{ "index": { "_id": 7 }}
{ "text": "women's foil shaver" }
{ "index": { "_id": 8 }}
{ "text": "women's foil advanced shaver" }
{ "index": { "_id": 9 }}
{ "text": "norelco women's foil advanced shaver" }
{ "index": { "_id": 10 }}
{ "text": "women's shavers" }

Now, I want to perform search for "en's shaver". I'm searching using follwing query:

POST /my_index1/my_type1/_search
{
    "query": {
       "match": {
          "text": 
          { "query": "en's shaver",

            "minimum_should_match": "100%"

          }
       }

    }
}

I want results to be in following sequence:

  1. men's shaver --> closest match with following same search keyword order "en's shaver
  2. women's shaver --> closest match with following same search keyword order "en's shaver
  3. men's foil shaver --> increased distance by 1
  4. women's foil shaver --> increased distance by 1
  5. men's foil advanced shaver --> increased distance by 2
  6. women's foil advanced shaver --> increased distance by 2
  7. men's shavers --> substring match for "shavers"
  8. women's shavers --> substring match for "shavers"

I'm performing following query. It is not giving me result in the order I want:

POST /my_index1/my_type1/_search
{
   "query": {
      "query_string": {
         "default_field": "text",
         "query": "men's shaver",
         "minimum_should_match": "90%"
      }
   }
}

Please suggest, How to achieve above result? Any suggestion will help.

*************************** UPDATE(6th may,2014) ********************************
I made some changes:
1. like using multi-field
2. using only one shard
3. use of analyze, filter and stemmers

Please see my settings below:

For index:

curl -XPUT "http://localhost:9200/my_improved_index" -d'
{
   "settings": {
        "analysis": {
            "filter": {
                "trigrams_filter": {
                    "type":     "ngram",
                    "min_gram": 1,
                    "max_gram": 50
                },
                 "my_stemmer" : {
                    "type" : "stemmer",
                    "name" : "minimal_english"
                }
            },
            "analyzer": {
                "trigrams": {
                    "type":      "custom",
                    "tokenizer": "standard",
                    "filter":   [
                        "standard",
                        "lowercase",
                        "trigrams_filter"
                    ]
                },
                "my_stemmer_analyzer":{
                    "type":      "custom",
                    "tokenizer": "standard",
                    "filter":   [
                        "standard",
                        "lowercase",
                        "my_stemmer"
                    ]
                }
            }
        }
    }
}'

For mappings:

curl -XPUT "http://localhost:9200/my_improved_index/my_improved_index_type/_mapping" -d'
{
    "my_improved_index_type": {
      "properties": {
         "name": {
            "type": "multi_field",
            "fields": {
               "name_gram": {
                  "type": "string",
                  "analyzer": "trigrams"
               },
               "untouched": {
                  "type": "string",
                  "index": "not_analyzed"
               },
               "name_stemmer":{
                   "type": "string",
                   "analyzer": "my_stemmer_analyzer"
               }
            }
         }
      }
   }

}'

Available documents:

  1. men’s shaver
  2. men’s shavers
  3. men’s foil shaver
  4. men’s foils shaver
  5. men’s foil shavers
  6. men’s foils shavers
  7. men's foil advanced shaver
  8. norelco men's foil advanced shaver

Query:

curl -XPOST "http://localhost:9200/my_improved_index/my_improved_index_type/_search" -d'
{
   "size": 30,
   "query": {
      "bool": {
         "should": [
            {
               "match": {
                  "name.untouched": {
                     "query": "men\"s shaver",
                     "operator": "and",
                     "type": "phrase",
                     "boost": "10"
                  }
               }
            },
            {
               "match_phrase": {
                  "name.name_stemmer": {
                     "query": "men\"s shaver",
                     "slop": 5
                  }
               }
            }
         ]
      }
   }
}'

Returned result:

  1. men's shaver --> correct
  2. men's shavers --> correct
  3. men's foils shaver --> NOT correct
  4. norelco men's foil advanced shaver --> NOT correct
  5. men's foil advanced shaver --> NOT correct
  6. men's foil shaver --> NOT correct.

Expected result:

  1. men's shaver --> exact phrase match
  2. men's shavers --> ZERO word distance + 1 plural
  3. men's foil shaver --> 1 word distance
  4. men's foils shaver --> 1 word distance + 1 plural
  5. men's foil advanced shaver --> 2 word distance
  6. norelco men's foil advanced shaver --> 2 word distance

Why higher distance document scored higher? How to achieve this result? Is there any problem with stemmer or nGram settings?

Kruti Shukla
  • 723
  • 1
  • 6
  • 7
  • May I know y u using ngram analyzers?? – BlackPOP Apr 23 '14 at 15:37
  • http://stackoverflow.com/questions/23243867/elasticsearch-find-substring-match – BlackPOP Apr 23 '14 at 15:58
  • @BlackPOP: I'm using ngram analyzers to get more conbinations of search term keyword. For example, to search for "men's"/"en's" both should be searchable. Yes the other post is also from me. But both have separate problems to work on. – Kruti Shukla Apr 24 '14 at 04:16
  • I think you can use the same here.. With not analyzed field with use wildcard or regexp... – BlackPOP Apr 24 '14 at 05:03
  • @BlackPOP: using not analyzed is not helping here. I'm trying to use "span near" query that can return with in order distance result from result no. 3 to 6 but it is not returning anything may be because it field is set to "not anlayzed". I tried searching using "query string" for query: "*men's* *shaver*" but it is returning me incorrect sequence. Returned result is --> 1. men's shaver 2. women's shaver and so on. Expected result --> 1. men's shaver 2. men's foil shaver. Once all men's records are done after that show women's(substring) result. I think "multi field" might help me. I will try. – Kruti Shukla Apr 24 '14 at 09:01
  • @KrutiShukla How did you resolve this issue? I am also facing same issue. If you found solution for this, please add your solution here. Thanks – Sanjay Khatri Dec 10 '16 at 06:27

0 Answers0