18

For optimization purposes, I am trying to cut down my total field count. However before I am going to do that I want to get an idea of how many fields I actually have. There doesn't seem to be any Information in the _stats endpoint and I can't quite figure out how the migration tool does its field count calculation.

Is there some way, either with an endpoint or by other means, to get the total field count of a specified index?

Fairy
  • 3,592
  • 2
  • 27
  • 36

6 Answers6

48

To build a bit further upon what the other answer provided, you can get the mapping and then simply count the number of times the keyword type appears in the output, which gives the number of fields since each field needs a type:

curl -s -XGET localhost:9200/index/_mapping?pretty | grep type | wc -l
Val
  • 207,596
  • 13
  • 358
  • 360
  • That is what I am looking for! Much appreciated! – Fairy Nov 17 '16 at 13:04
  • 1
    Is this accurate if you have multiple types, since the fields could overlap? Also, won't this count types in dynamic templates? It seems like you need the count of the unique fields, since that is what lucene cares about. – Christian Trimble Oct 18 '17 at 05:26
  • @ChristianTrimble feel free to ask a new question referencing this one – Val Oct 18 '17 at 05:27
  • 4
    @Val a field can have more than one "type": e.g. "datapath-id": { "fields": { "keyword": { "ignore_above": 256, "type": "keyword" } }, "type": "text" } – Anish Nov 27 '18 at 23:59
  • 1
    seems, like https://stackoverflow.com/a/54218379/5215544 is more correct. – BaZZiliO Mar 31 '21 at 15:42
  • 2
    @BaZZiliO now probably, yet the `_field_caps` API was not available at the time the answer was contributed. – Val Mar 31 '21 at 15:54
  • This will not work if a field has been disabled for indexing with `"index": false` in the mappings. See https://www.elastic.co/guide/en/elasticsearch/reference/current/mapping-index.html – joctee Oct 03 '22 at 06:36
  • @joctee it dos work because the field is still in the mapping which is what this command retrieves. – Val Oct 03 '22 at 06:38
  • You are right. You get the number of fields in the mappings but you won't get the number of fields that are actually indexed. It's the number of indexed fields that you want to reduce for optimization purposes. And for that, this approach does not work. – joctee Oct 03 '22 at 06:47
  • Well, it's trivial to plug `jq` in the middle of the pipeline to filter out those fields. – Val Oct 03 '22 at 06:52
27

You can try this:

curl -s -XGET "http://localhost:9200/index/_field_caps?fields=*" | jq '.fields|length'
Fred
  • 902
  • 9
  • 12
  • 6
    This is the correct answer IMO, not the accepted one. – いちにち Aug 20 '20 at 14:28
  • 3
    Disclaimer: When the accepted answer was contributed, the `_field_caps` API didn't exist. Also, just note that counting `_fields.length` is not enough to know the number of fields though, as there can be sub-fields as well – Val Mar 31 '21 at 15:56
10

Just a quick way to get a relative estimate in Kibana without writing a script (I don't believe this is 100% precise, but it's an easy way to tell if your dynamic fields are blowing up to huge numbers for some reason).

Run this query in the Kibana dev tools

GET /index_name/_mapping

Inside the Kibana output, perform a search for all instances of "type" (including quotes). This will count the instances and get you the answer. (In this example, 804)

enter image description here

This can be helpful if you scratching your head as to why you're getting the [remote_transport_exception] error of

Limit of total fields [1000] in index [index_name] has been exceeded

Matthew Rideout
  • 7,330
  • 2
  • 42
  • 61
6

The first answer by Val solves the problem for me too. But I just wanted to list out some corner cases which can lead to misleading numbers.

  1. The document has fields with the word "type" in them.

For example

 "content_type" : {
   "type" : "text",
     "fields" : {
       "keyword" : {
          "type" : "keyword",
       }
     }
   },

This will match grep type thrice while it should do that only twice i.e. it should not match "content_type". This scenario has an easy fix.

Instead of

curl -s -XGET localhost:9200/index/_mapping?pretty | grep type 

Use

curl -s -XGET localhost:9200/index/_mapping?pretty | grep '"type"'

to get an exact match of '"type"'

  1. The document has a field of the exact name "type"

For example

"type" : {
  "type" : "text",
   "fields" : {
     "keyword" : {
       "type" : "keyword"
     }
   }
},

In this case also the match is thrice instead of twice. But using

curl -s -XGET localhost:9200/index/_mapping?pretty | grep '"type"'

is not going to cut it. We will have to skip fields with the "type" keyword as substring as well as an exact match. In this case we can add an additional filter like so:

curl -s -XGET localhost:9200/index/_mapping?pretty |\
grep '"type"' | grep -v "{"

In addition to the above 2 scenarios, if you are using the api programmatically to push numbers for tracking i.e. into something like AWS cloudwatch or Graphite, you can use the following code to call the API - get the data, and recursively search for the keyword "type" - while skipping any fuzzy matches and resolving deeper into fields with the exact name "type".

import sys
import json
import requests

# The following find function is a minor edit of the function posted here
# https://stackoverflow.com/questions/9807634/find-all-occurrences-of-a-key-in-nested-python-dictionaries-and-lists

def find(key, value):
  for k, v in value.iteritems():
    if k == key and not isinstance(v, dict) and not isinstance(v, list):
      yield v
    elif isinstance(v, dict):
      for result in find(key, v):
        yield result
    elif isinstance(v, list):
      for d in v:
        for result in find(key, d):
          yield result

def get_index_type_count(es_host):
  try:
    response = requests.get('https://%s/_mapping/' % es_host)
  except Exception as ex:
    print('Failed to get response - %s' % ex)
    sys.exit(1)

  indices_mapping_data = response.json()
  output = {}

  for index, mapping_data in indices_mapping_data.iteritems():
    output[index] = len(list(find('type', mapping_data)))

  return output

if __name__ == '__main__':
  print json.dumps(get_index_type_count(sys.argv[1]), indent=2)

The above code is also posted as a gist here - https://gist.github.com/saurabh-hirani/e8cbc96844307a41ff4bc8aa8ebd7459

Saurabh Hirani
  • 1,198
  • 14
  • 21
1

You can get that information with the _mapping endpoint of the index API, see https://www.elastic.co/guide/en/elasticsearch/reference/current/indices-get-mapping.html

The get mapping API allows to retrieve mapping definitions for an index or index/type.

GET /twitter/_mapping/tweet

With curl: curl [elasticsearch adress]/[index]/_mapping?pretty

Community
  • 1
  • 1
baudsp
  • 4,076
  • 1
  • 17
  • 35
0

A field can have more than one "type": e.g.

"datapath-id": {
    "fields": {
        "keyword": {
            "ignore_above": 256, 
            "type": "keyword"
        }
    }, 
    "type": "text"
}

We can ignore "type" within "fields" to get exact field count. One example is:

import json


def myprint(d, field_count):
    for k, v in d.iteritems():
        if isinstance(v, dict):
            if k != "fields":
                field_count = myprint(v, field_count)
        else:
            print "{0} : {1}".format(k, v)
            field_count += 1
    return field_count

with open("output/mappings.json") as f:
    d = json.load(f)
    final_field_count = myprint(d, field_count=0)
    print "field count", final_field_count
Anish
  • 1,920
  • 11
  • 28
  • 48