Elastic search partial substring search

Question

I am trying to implement partial substring search in elastic serach 7.1 using following analyzer

PUT my_index-001

{
  "settings": {
    "analysis": {
      "analyzer": {
        "autocomplete": {
          "tokenizer": "whitespace",
          "filter": [
            "lowercase",
            "autocomplete"
          ]
        },
        "autocomplete_search": {
          "tokenizer": "whitespace",
          "filter": [
            "lowercase"
          ]
        }
      },
      "filter": {
        "autocomplete": {
          "type": "nGram",
          "min_gram": 2,
          "max_gram": 40
        }
      }
    }
  },
  "mappings": {
    "doc": {
      "properties": {
        "title": {
          "type": "string",
          "analyzer": "autocomplete",
          "search_analyzer": "autocomplete_search"
        }
      }
    }
  }
}

After that i tried adding some sample data to my_index-001 and type doc

    PUT my_index-001/doc/1
    {
      "title": "ABBOT Series LTD 2014"
    }
 
    PUT my_index-001/doc/2
    {
      "title": "ABBOT PLO LTD 2014A"
    }
   
    PUT my_index-001/doc/3
    {
      "title": "ABBOT TXT"
    }
    PUT my_index-001/doc/4
    {
      "title": "ABBOT DMO LTD. 2016-II"
    }

Query used to perform partial search :

GET my_index-001/_search
{
  "query": {
    "match": {
      "title": {
        "query": "ABB",
        "operator": "or"
      }
    }
  }
}

I was expecting the following output from the analyzer

If i type in ABB i should get docid 1,2,3,4
If i type in ABB 2014 i should get docid 1,2
IF i type in ABBO PLO i should get doc 2
If i type in TXT i should get doc 3

With the above analyzer setting i am not getting expected results . Please let me know if i am missing anything in my analyzer setting of Elastic search

score 1 · Accepted Answer · answered Apr 01 '21 at 21:17

You were almost there but there are a couple of issues.

When creating index mappings through Kibana Dev Tools, there mustn't be any whitespace between the URI and the request body. You have whitespace in the first code snippet which caused ES to ignore the request body entirely! So remove that whitespace.
The maximum ngram difference is set to 1 by default. In order to use your high ngram intervals, you'll need to explicitly increase the index-level setting max_ngram_diff:

PUT my_index-001
{
  "settings": {
    "index": {
      "max_ngram_diff": 40   <--
    },
    ...
  }
}

Type names are deprecated in v7. So is the nGram token filter in favor of ngram (lowercase g). And so is the string field type too! Here's the corrected PUT request body:

PUT my_index-001                  <--- no whitespace after the URI!
{
  "settings": {
    "index": {
      "max_ngram_diff": 40        <--- explicit setting
    },
    "analysis": {
      "analyzer": {
        "autocomplete": {
          "tokenizer": "whitespace",
          "filter": [
            "lowercase",
            "autocomplete"
          ]
        },
        "autocomplete_search": {
          "tokenizer": "whitespace",
          "filter": [
            "lowercase"
          ]
        }
      },
      "filter": {
        "autocomplete": {
          "type": "ngram",         <--- ngram, not nGram
          "min_gram": 2,
          "max_gram": 40
        }
      }
    }
  },
  "mappings": {
    "properties": {
      "title": {
        "type": "text",            <--- text, not string
        "analyzer": "autocomplete",
        "search_analyzer": "autocomplete_search"
      }
    }
  }
}

Since different mapping types had been deprecated in favor of the generic _doc type, you'll need to adjust the way you insert documents. The only difference, luckily, is changing doc to _doc in the URI:

PUT my_index-001/_doc/1
{ "title": "ABBOT Series LTD 2014" }
 
PUT my_index-001/_doc/2
{ "title": "ABBOT PLO LTD 2014A" }
   
PUT my_index-001/_doc/3
{ "title": "ABBOT TXT" } 

PUT my_index-001/_doc/4
{ "title": "ABBOT DMO LTD. 2016-II" }

Finally, your query is perfectly fine and should behave the way you expect it to. The only thing to change is the operator to and when querying for two or more substrings, i.e.:

GET my_index-001/_search
{
  "query": {
    "match": {
      "title": {
        "query": "ABB 2014",
        "operator": "and"
      }
    }
  }
}

Other than that, all four of your test scenarios should return what you expect.

thanks that works really well .also what is the difference of these 2 tags in the mapping section : "analyzer": "autocomplete", "search_analyzer": "autocomplete_search" .Although i had in my intial script but i was not completly able to understand the usage of it — amrit, Apr 02 '21 at 06:25
You're welcome. The difference between these two is explained [here](https://stackoverflow.com/a/15932838/8160318). — Joe - GMapsBook.com, Apr 02 '21 at 07:59

Elastic search partial substring search

1 Answers1