9

I am working on an ElasticSearch (6.2) project where the index has many keyword fields and they are normalized with lowercase filter for performing case-insensitive searches. The search working great and returning actual values (not lowercase) of the normalized fields. However, the aggregations not returning the actual value (returning lowercase) of the fields.

The following example has been taken from ElasticSearch doc.

https://www.elastic.co/guide/en/elasticsearch/reference/master/normalizer.html

Creating index:

PUT index
{
  "settings": {
    "analysis": {
      "normalizer": {
        "my_normalizer": {
          "type": "custom",
          "char_filter": [],
          "filter": ["lowercase", "asciifolding"]
        }
      }
    }
  },
  "mappings": {
    "_doc": {
      "properties": {
        "foo": {
          "type": "keyword",
          "normalizer": "my_normalizer"
        }
      }
    }
  }
}

Inserting a doc:

PUT index/_doc/1
{
  "foo": "Bar"
}

PUT index/_doc/2
{
  "foo": "Baz"
}

Search with aggregation:

GET index/_search
{
  "size": 0,
  "aggs": {
    "foo_terms": {
      "terms": {
        "field": "foo"
      }
    }
  }
}

Result:

{
  "took": 43,
  "timed_out": false,
  "_shards": {
    "total": 1,
    "successful": 1,
    "skipped" : 0,
    "failed": 0
  },
  "hits": {
    "total": 3,
    "max_score": 0.0,
    "hits": {
    "total": 2,
    "max_score": 0.47000363,
    "hits": [
      {
        "_index": "index",
        "_type": "_doc",
        "_id": "1",
        "_score": 0.47000363,
        "_source": {
          "foo": "Bar"
        }
      },
      {
        "_index": "index",
        "_type": "_doc",
        "_id": "2",
        "_score": 0.47000363,
        "_source": {
          "foo": "Baz"
        }
      }
    ]
  }
  },
  "aggregations": {
    "foo_terms": {
      "doc_count_error_upper_bound": 0,
      "sum_other_doc_count": 0,
      "buckets": [
        {
          "key": "bar",
          "doc_count": 1
        },
        {
          "key": "baz",
          "doc_count": 1
        }
      ]
    }
  }
}

If you check the aggregation, you will see that lowercase value has been returned. e.g. "key": "bar".

Is there any way to change the aggregation to return actual value?

e.g. "key": "Bar"

Anam
  • 11,999
  • 9
  • 49
  • 63

1 Answers1

10

If you want to do case-insensitive search yet return exact values in your aggregations you don't need any normalizer. You can simply have a text field (which lowercases the tokens and allows case-insensitive search by default) with a keyword sub-field. You'd use the former for search and the latter for aggregations. It goes like this:

PUT index
{
  "mappings": {
    "_doc": {
      "properties": {
        "foo": {
          "type": "text",
          "fields": {
            "keyword": {
              "type": "keyword"
            }
          }
        }
      }
    }
  }
}

After indexing your two documents, your can issue a terms aggregation on foo.keyword:

GET index/_search
{
  "size": 2,
  "aggs": {
    "foo_terms": {
      "terms": {
        "field": "foo.keyword"
      }
    }
  }
}

And the result would look like this:

{
  "took": 0,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 2,
    "max_score": 1,
    "hits": [
      {
        "_index": "index",
        "_type": "_doc",
        "_id": "2",
        "_score": 1,
        "_source": {
          "foo": "Baz"
        }
      },
      {
        "_index": "index",
        "_type": "_doc",
        "_id": "1",
        "_score": 1,
        "_source": {
          "foo": "Bar"
        }
      }
    ]
  },
  "aggregations": {
    "foo_terms": {
      "doc_count_error_upper_bound": 0,
      "sum_other_doc_count": 0,
      "buckets": [
        {
          "key": "Bar",
          "doc_count": 1
        },
        {
          "key": "Baz",
          "doc_count": 1
        }
      ]
    }
  }
}
Val
  • 207,596
  • 13
  • 358
  • 360
  • Thank you for your answer. Does it allow exact match search like keyword? – Anam Aug 03 '18 at 04:18
  • Yes, by searching on `foo.keyword` you can search exact values and when searching on `foo` you can search case insensitively. – Val Aug 03 '18 at 05:13
  • this fits my use-case very well, however, in addition to the above 2 conditions (case insensitive & aggregation use). it also needs wildcard support while still retaining original value after aggregation. how should the mapping be formulated? – Marcus Lim Dec 02 '21 at 02:57
  • @MarcusLim please create a new question (maybe referencing this one) and explain your additional use case. – Val Dec 02 '21 at 04:37