0

I'm a bit puzzled by the way term queries work on text fields (I don't even know if it's ok to use them on text fields).

This is my index using standard analyzer:

{
  "my-index-000001" : {
    "mappings" : {
      "properties" : {
        "city" : {
          "type" : "text",
          "fields" : {
            "raw" : {
              "type" : "keyword"
            }
          }
        }
      }
    }
  }
}

And this is the data it has so far:

{
  "took" : 0,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 2,
      "relation" : "eq"
    },
    "max_score" : 1.0,
    "hits" : [
      {
        "_index" : "my-index-000001",
        "_type" : "_doc",
        "_id" : "1",
        "_score" : 1.0,
        "_source" : {
          "city" : "New York"
        }
      },
      {
        "_index" : "my-index-000001",
        "_type" : "_doc",
        "_id" : "2",
        "_score" : 1.0,
        "_source" : {
          "city" : "York"
        }
      }
    ]
  }
}

Using this query matches both documents in the index:

GET my-index-000001/_search
{
  "from":0,"size":20,"timeout":"20s",
"query": {
    "wildcard": {
      "city": {
        "value": "yor*"
      }
    }
  }
}

As you can see, the casing from the query doesn't match any of existing documents (both documents contain York). Also if query for "yOR*" still both documents get matched. When I query for field "city.raw", which is a keyword field, there will be no match.

According to docs , term-level queries should not analyze the search terms which seems to not be true if the field type is text. Is this intended or a bug? Is it safe to use term queries on text fields? (if not safe, why ?)

Thank you.

user1934513
  • 693
  • 4
  • 21
  • Forgot to mention I'm using elasticsearch version 7.10.0 with lucene 8.7.0 in a docker env from this image docker.elastic.co/elasticsearch/elasticsearch:7.10.0 – user1934513 Dec 13 '22 at 11:21

1 Answers1

1

When you have a field of the "keyword" type, the text is indexed as it is in Elasticsearch rather than being analyzed at index time.

For example : "New York" is stored as "New York"

When the field is of text type, the text is analyzed at the index time itself, and stored in Elasticsearch.

For example: "New York" is broken down into "new" and "york"

As a result, you will find the results while searching for "yor*" in the "city" field.

It is mentioned in the documentation also that term-level queries work on the text that is stored in Elasticsearch and does not perform any search time analysis.

Unlike full-text queries, term-level queries do not analyze search terms. Instead, term-level queries match the exact terms stored in a field.

However it is best to use term level queries with keyword type fields

ESCoder
  • 15,431
  • 2
  • 19
  • 42
  • As I mentioned in the question, searching for yOR* also matches both documents which seems that it performs analysis on the search – user1934513 Dec 13 '22 at 11:59
  • @user1934513 I tried the query with `yOr*`, but it's giving 0 search result to me when searching on the `city` field. Can you please check again ? – ESCoder Dec 13 '22 at 12:12
  • GET my-index-000001/_search { "query": { "wildcard": { "city": { "value": "yOR*" } } } } This query returns both documents – user1934513 Dec 13 '22 at 12:47
  • can you please check what elasticsearch you use? – user1934513 Dec 13 '22 at 12:49
  • I use Elasticsearch version 8.3.2. But I think this will not make any difference, can you please share your index mappings and settings ? – ESCoder Dec 13 '22 at 13:30
  • I actually updated to 8.5.3 and now I get the same results as you. I replicated the same index from scratch a few times and it's consistent. – user1934513 Dec 13 '22 at 14:04
  • @user1934513 great . Although it's a bit weird that it didn't gave correct output in 7.10 version. If my answer helped you then please don't forget to upvote and accept the answer – ESCoder Dec 13 '22 at 15:10