0

I have two objects in the index "name" and "email". "name" can be of ngrams max of 10. I want to create index with above. So I have created my index with mappings and settings as

{
 "mappings": {"properties": {"email": {"type": "text", "analyzer": "keyword_analyzer"}, "name": {"type": "text", "analyzer": "keyword_analyzer"} } }, 
 "settings": {"index": {"analysis" : {"analyzer" : {"keyword_analyzer" : {"filter" : ["lowercase"], "tokenizer" : "keyword"} } }, "number_of_shards": "1", "number_of_replicas": "0", "max_ngram_diff" : "5"} } }

I have inserted some data

{"name": "divya p", "email": "divya@email.com"}
{"name": "divya a", "email": "divya12123@email.com"}
{"name": "kumar a", "email": "kumar@email.com"}
{"name": "aruna v", "email": "aruna@email.com"}

and I want to search all the names that have name "divya" so i have used a query

{"query": {"bool": {"should": [{"match": {"name": "divya"} } ] } } }

it doesnt return any result, but if i give the full name it gives me the results

{"query": {"bool": {"must": [{"match": {"name": "divya a"} } ] } } }

to check the issue I have modified the settings by adding "filter" as mentioned in the link's first answer. But when I am running the query I am getting all the 4 values inserted. Please guide where I went wrong ? My modified settings as below

"settings": {"index": {"analysis": {"analyzer": {"keyword_analyzer": {"filter": ["lowercase", "ngram_filter"], "type": "custom", "tokenizer": "standard"} }, "filter": {"ngram_filter": {"type": "nGram", "min_gram": "1", "max_gram": "5"} } }, "number_of_shards": "1", "number_of_replicas": "0", "max_ngram_diff" : "50"} }
Raady
  • 1,686
  • 5
  • 22
  • 46

1 Answers1

1

Based on the settings and mapping of the index, that you have provided above, the name field is using keyword_analyzer, which has a tokenizer as keyword.

So the field is not getting tokenized, and the tokens generated for divya p will be

GET /queidx/_analyze
{
  "analyzer": "keyword_analyzer",
  "text": "divya p"
}

The token generated is :

{
  "tokens" : [
    {
      "token" : "divya p",
      "start_offset" : 0,
      "end_offset" : 7,
      "type" : "word",
      "position" : 0
    }
  ]
}


So, when you are trying to search all the names that have the name "divya", it gives an empty result because it does not have any document that exactly matches with name "divya".

If you don't want to change the mapping and settings of the index, then you can use a wildcard query to get all those documents that begin with "divya"

POST queidx/_search
{
  "query":{
    "wildcard": {
      "name": {
        "value": "divya*"
      }
    }
  }
}

But since wildcard queries are expensive, and if your use case is just to find all those documents that have "divya", I would advise you to modify your index such that the name field is of type text (with standard analyzer which is the default one), and then simply perform match query on the name field.

ESCoder
  • 15,431
  • 2
  • 19
  • 42
  • @Raady did you get a chance to go through the answer, looking forward to get feedback from you ;-) – ESCoder Jan 22 '22 at 06:46