7

I have the following ElasticSearch query which I would think would return all matches on the email field where it equals myemails@email.com

"query": {
  "bool": {
    "must": [
      {
        "match": {
          "email": "myemail@gmail.com"
      }
    }
  ]
}

}

The mapping for the user type that is being searched is the following:

    {
      "users": {
      "mappings": {
         "user": {
            "properties": {
               "email": {
                  "type": "string"
               },
               "name": {
                  "type": "string",
                  "fields": {
                     "raw": {
                        "type": "string",
                        "index": "not_analyzed"
                     }
                  }
               },
               "nickname": {
                  "type": "string"
               },
            }
         }
       }
   }  
     }

The following is a sample of results returned from ElasticSearch

 [{
    "_index": "users",
    "_type": "user",
    "_id": "54b19c417dcc4fe40d728e2c",
    "_score": 0.23983537,
    "_source": {
    "email": "johnsmith@gmail.com",
    "name": "John Smith",
    "nickname": "jsmith",
 },
 {
    "_index": "users",
    "_type": "user",
    "_id": "9c417dcc4fe40d728e2c54b1",
    "_score": 0.23983537,
    "_source": {
       "email": "myemail@gmail.com",
       "name": "Walter White",
       "nickname": "wwhite",
 },
 {
    "_index": "users",
    "_type": "user",
    "_id": "4fe40d728e2c54b19c417dcc",
    "_score": 0.23983537,
    "_source": {
       "email": "JimmyFallon@gmail.com",
       "name": "Jimmy Fallon",
       "nickname": "jfallon",
}]

From the above query, I would think this would need to have an exact match with 'myemail@gmail.com' as the email property value.

How does the ElasticSearch DSL query need to change in order to only return exact matches on email.

Deenadhayalan Manoharan
  • 5,436
  • 14
  • 30
  • 50
TheJediCowboy
  • 8,924
  • 28
  • 136
  • 208

1 Answers1

12

The email field got tokenized , which is the reason for this anomaly. So what happened is when you indexed

"myemail@gmail.com" => [ "myemail" , "gmail.com" ]

This way if you search for myemail OR gmail.com you will get the match right. SO what happens is , when you search for john@gmail.com , the analyzer is also applied on search query. Hence its gets broken into

"john@gmail.com" => [ "john" , "gmail.com" ]

here as "gmail.com" token is common in search term and indexed term , you will get a match.

To over ride this behavior , declare the email; field as not_analyzed. There by the tokenization wont happen and the entire string will get indexed as such.

With "not_analyzed"

"john@gmail.com" => [ "john@gmail.com" ]

So modify the mapping to this and you should be good -

{
  "users": {
    "mappings": {
      "user": {
        "properties": {
          "email": {
            "type": "string",
            "index": "not_analyzed"
          },
          "name": {
            "type": "string",
            "fields": {
              "raw": {
                "type": "string",
                "index": "not_analyzed"
              }
            }
          },
          "nickname": {
            "type": "string"
          }
        }
      }
    }
  }
}

I have described the problem more precisely and another approach to solve it here.

Vineeth Mohan
  • 18,633
  • 8
  • 63
  • 77
  • 1
    URL in answer is dead. – IROEGBU Feb 28 '19 at 16:44
  • Yes, the URL is not available anymore. I guess this is the updated URL: https://qbox.io/blog/elasticsearch-aggregation-custom-analyzer – Raj Rajeshwar Singh Rathore Aug 11 '19 at 13:30
  • What if you want the original tokenization? I have the same prob as OP. No matter what the query is, all results are being returned, for one specific field. – Kalnode Feb 25 '21 at 01:43
  • 1
    Ok, here is a way to maintain tokenization on the field, and still do a bool query on it: use the `.keyword` feature, available from Elasticsearch 5.0. Check answer [here](https://stackoverflow.com/a/50631178/4378314) and [here](https://stackoverflow.com/a/48875105/4378314). – Kalnode Feb 25 '21 at 01:48