3

I am trying to implement partial substring search in elastic serach 7.1 using following analyzer

PUT my_index-001

{
  "settings": {
    "analysis": {
      "analyzer": {
        "autocomplete": {
          "tokenizer": "whitespace",
          "filter": [
            "lowercase",
            "autocomplete"
          ]
        },
        "autocomplete_search": {
          "tokenizer": "whitespace",
          "filter": [
            "lowercase"
          ]
        }
      },
      "filter": {
        "autocomplete": {
          "type": "nGram",
          "min_gram": 2,
          "max_gram": 40
        }
      }
    }
  },
  "mappings": {
    "doc": {
      "properties": {
        "title": {
          "type": "string",
          "analyzer": "autocomplete",
          "search_analyzer": "autocomplete_search"
        }
      }
    }
  }
}

After that i tried adding some sample data to my_index-001 and type doc

    PUT my_index-001/doc/1
    {
      "title": "ABBOT Series LTD 2014"
    }
 
    PUT my_index-001/doc/2
    {
      "title": "ABBOT PLO LTD 2014A"
    }
   
    PUT my_index-001/doc/3
    {
      "title": "ABBOT TXT"
    }
    PUT my_index-001/doc/4
    {
      "title": "ABBOT DMO LTD. 2016-II"
    }

Query used to perform partial search :

GET my_index-001/_search
{
  "query": {
    "match": {
      "title": {
        "query": "ABB",
        "operator": "or"
      }
    }
  }
}

I was expecting the following output from the analyzer

  1. If i type in ABB i should get docid 1,2,3,4

  2. If i type in ABB 2014 i should get docid 1,2

  3. IF i type in ABBO PLO i should get doc 2

  4. If i type in TXT i should get doc 3

With the above analyzer setting i am not getting expected results . Please let me know if i am missing anything in my analyzer setting of Elastic search

amrit
  • 315
  • 1
  • 2
  • 11

1 Answers1

1

You were almost there but there are a couple of issues.

  1. When creating index mappings through Kibana Dev Tools, there mustn't be any whitespace between the URI and the request body. You have whitespace in the first code snippet which caused ES to ignore the request body entirely! So remove that whitespace.
  2. The maximum ngram difference is set to 1 by default. In order to use your high ngram intervals, you'll need to explicitly increase the index-level setting max_ngram_diff:
PUT my_index-001
{
  "settings": {
    "index": {
      "max_ngram_diff": 40   <--
    },
    ...
  }
}
  1. Type names are deprecated in v7. So is the nGram token filter in favor of ngram (lowercase g). And so is the string field type too! Here's the corrected PUT request body:
PUT my_index-001                  <--- no whitespace after the URI!
{
  "settings": {
    "index": {
      "max_ngram_diff": 40        <--- explicit setting
    },
    "analysis": {
      "analyzer": {
        "autocomplete": {
          "tokenizer": "whitespace",
          "filter": [
            "lowercase",
            "autocomplete"
          ]
        },
        "autocomplete_search": {
          "tokenizer": "whitespace",
          "filter": [
            "lowercase"
          ]
        }
      },
      "filter": {
        "autocomplete": {
          "type": "ngram",         <--- ngram, not nGram
          "min_gram": 2,
          "max_gram": 40
        }
      }
    }
  },
  "mappings": {
    "properties": {
      "title": {
        "type": "text",            <--- text, not string
        "analyzer": "autocomplete",
        "search_analyzer": "autocomplete_search"
      }
    }
  }
}
  1. Since different mapping types had been deprecated in favor of the generic _doc type, you'll need to adjust the way you insert documents. The only difference, luckily, is changing doc to _doc in the URI:
PUT my_index-001/_doc/1
{ "title": "ABBOT Series LTD 2014" }
 
PUT my_index-001/_doc/2
{ "title": "ABBOT PLO LTD 2014A" }
   
PUT my_index-001/_doc/3
{ "title": "ABBOT TXT" } 

PUT my_index-001/_doc/4
{ "title": "ABBOT DMO LTD. 2016-II" }
  1. Finally, your query is perfectly fine and should behave the way you expect it to. The only thing to change is the operator to and when querying for two or more substrings, i.e.:
GET my_index-001/_search
{
  "query": {
    "match": {
      "title": {
        "query": "ABB 2014",
        "operator": "and"
      }
    }
  }
}

Other than that, all four of your test scenarios should return what you expect.

Joe - GMapsBook.com
  • 15,787
  • 4
  • 23
  • 68
  • thanks that works really well .also what is the difference of these 2 tags in the mapping section : "analyzer": "autocomplete", "search_analyzer": "autocomplete_search" .Although i had in my intial script but i was not completly able to understand the usage of it – amrit Apr 02 '21 at 06:25
  • 1
    You're welcome. The difference between these two is explained [here](https://stackoverflow.com/a/15932838/8160318). – Joe - GMapsBook.com Apr 02 '21 at 07:59