I am using Python and Elasticsearch to process large amounts of data. Using the Search API, a response will contain the requested documents in a list "hits":
{
...
"hits" : {
...
"hits" : [
{ "_source": {...} },
{ "_source": {...} },
{ "_source": {...} }
]
...
}
However, each document is embedded in an _source field, rather than being the raw document I wish (and expected) Elasticsearch would give me. In order for this information to be usable for me, I need to extract every document from each hits.source field into a new list like this:
hits = es_response.get("hits").get("hits")
items = []
for hit in hits:
items.append(hit.get("_source"))
return {
"items": items
}
Optimally, I would prefer to not have to extract each document from the response into a list. Is there a way to configure Elasticsearch to respond with the document data NOT nested in _source? If not, is my solution the best way of getting around this? I was thinking of using Python generators, but need to see if they better fit my use case (I believe they can be slower but use less memory).
Note: I am aware of Elasticsearch's filter_path parameter that allows you to ONLY return the _source field (The response example above assumes usage of this feature), but each document is still embedded within its own _source field and needs to be extracted to an upper layer. Therefore, the question does not match previously-asked questions on this topic.