0

I've found some answer like Make elasticsearch only return certain fields?

But they all need _source field.

In my system, disk and network are both scarce resources.

I can't store _source field and I don't need _index, _score field.

ElasticSearch Version: 5.5

Index Mapping just likes

{
  "index_2020-04-08": {
    "mappings": {
      "type1": {
        "_all": {
          "enabled": false
        },
        "_source": {
          "enabled": false
        },
        "properties": {
          "rank_score": {
            "type": "float"
          },
          "first_id": {
            "type": "keyword"
          },
          "second_id": {
            "type": "keyword"
          }
        }
      }
    }
  }
}

My query:

GET index_2020-04-08/type1/_search
{
  "query": {
    "bool": {
      "filter": {
        "term": {
          "first_id": "hello"
        }
      }
    }
  },
  "size": 1000,
  "sort": [
    {
      "rank_score": {
        "order": "desc"
      }
    }
  ]
}

The search results I got :

{
  "took": 1,
  "timed_out": false,
  "_shards": {
    "total": 1,
    "successful": 1,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 2,
    "max_score": null,
    "hits": [
      {
        "_index": "index_2020-04-08",
        "_type": "type1",
        "_id": "id_1",
        "_score": null,
        "sort": [
          0.06621722
        ]
      },
      {
        "_index": "index_2020-04-08",
        "_type": "type1",
        "_id": "id_2",
        "_score": null,
        "sort": [
          0.07864579
        ]
      }
    ]
  }
}

The results I want:

{
  "took": 1,
  "timed_out": false,
  "_shards": {
    "total": 1,
    "successful": 1,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 2,
    "max_score": null,
    "hits": [
      {
        "_id": "id_1"
      },
      {
        "_id": "id_2"
      }
    ]
  }
}

Can I implement it?

cnby
  • 389
  • 4
  • 15

2 Answers2

3

To return specific fields in the document, you must do one of the two:

  1. Include the _source field in your documents, which is enabled by default.
  2. Store specific fields with the stored fields feature which must be enabled manually

Because you want pretty much the document Ids and some metadata, you can use the filter_path feature.

Here's an example that's close to what you want (just change the field list):

$ curl -X GET "localhost:9200/metricbeat-7.6.1-2020.04.02-000002/_search?filter_path=took,timed_out,_shards,hits.total,hits.max_score,hits.hits._id&pretty"
{
  "took" : 1,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 10000,
      "relation" : "gte"
    },
    "max_score" : 1.0,
    "hits" : [
      {
        "_id" : "8SEGSHEBzNscjCyQ18cg"
      },
      {
        "_id" : "8iEGSHEBzNscjCyQ18cg"
      },
      {
        "_id" : "8yEGSHEBzNscjCyQ18cg"
      },
      {
        "_id" : "9CEGSHEBzNscjCyQ18cg"
      },
      {
        "_id" : "9SEGSHEBzNscjCyQ18cg"
      },
      {
        "_id" : "9iEGSHEBzNscjCyQ18cg"
      },
      {
        "_id" : "9yEGSHEBzNscjCyQ18cg"
      },
      {
        "_id" : "-CEGSHEBzNscjCyQ18cg"
      },
      {
        "_id" : "-SEGSHEBzNscjCyQ18cg"
      },
      {
        "_id" : "-iEGSHEBzNscjCyQ18cg"
      }
    ]
  }
}
Alexandre Juma
  • 3,128
  • 1
  • 20
  • 46
2

Just to clarify based on the SO question you linked -- you're not storing the _source, you're requesting it from ES. It's usually used to limit what you want to have retrieved, i.e.

...
"_source": ["only", "fields", "I", "need"]
...

_score, _index etc are meta fields that are going to be retrieved no matter what. You can "hack" it a bit by seeting the size to 0 and aggregating, i.e.

{
  "size": 0,
  "aggs": {
    "by_ids": {
      "terms": {
        "field": "_id"
      }
    }
  }
} 

which will save you a few bytes

{
  "took" : 1,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 2,
      "relation" : "eq"
    },
    "max_score" : null,
    "hits" : [ ]
  },
  "aggregations" : {
    "terms" : {
      "doc_count_error_upper_bound" : 0,
      "sum_other_doc_count" : 0,
      "buckets" : [
        {
          "key" : "Ac76WXEBnteqn982smh_",
          "doc_count" : 1
        },
        {
          "key" : "As77WXEBnteqn982EGgq",
          "doc_count" : 1
        }
      ]
    }
  }
}

but performing aggregations has a cost of its own.

Joe - GMapsBook.com
  • 15,787
  • 4
  • 23
  • 68