ElasticSearch query to pandas dataframe

Question

I have a query:

s = Search(using=client, index='myindex', doc_type='mytype')
s.query = Q('bool', must=[Q('match', BusinessUnit=bunit),
                          Q('range', **dicdate)])

res = s.execute()

return me 627033 lines, I want to convert this dictionary in a dataframe with 627033 lines

Can you give more information about the output of ElasticSearch query? If it is simply dictionary, the question should be converting dictionary to dataframe. There are many answers on this for example https://stackoverflow.com/questions/34589332/python-dictionary-to-pandas-dataframe — Nelson Dinh, Sep 28 '17 at 15:11
actually is not the format of a dictionary that i am searching for, but it always return only 10 elements i want all of them — Náthali, Sep 28 '17 at 16:55

score 3 · Answer 1 · answered May 11 '19 at 20:54

If your request is likely to return more than 10,000 documents from Elasticsearch, you will need to use the scrolling function of Elasticsearch. Documentation and examples for this function are rather difficult to find, so I will provide you with a full, working example:

import pandas as pd
from elasticsearch import Elasticsearch
import elasticsearch.helpers


es = Elasticsearch('127.0.0.1',
        http_auth=('my_username', 'my_password'),
        port=9200)

body={"query": {"match_all": {}}}
results = elasticsearch.helpers.scan(es, query=body, index="my_index")
df = pd.DataFrame.from_dict([document['_source'] for document in results])

Simply edit the fields that start with "my_" to correspond to your own values

score 2 · Answer 2 · answered Dec 12 '17 at 19:09

Based on your comment I think what you're looking for is size:

es.search(index="my-index", doc_type="mydocs", body="your search", size="1000")

I'm not sure if this will work for 627,033 lines -- you might need scroll for that.

https://www.elastic.co/guide/en/elasticsearch/reference/current/search-request-scroll.html

score 0 · Answer 3 · edited Aug 26 '21 at 17:50

I found the solution by Phil B a good template for my situation. However, all results are returned as lists, rather than atomic data types. To get around this, I added the following helper function and code:

def flat_data(val):
  if isinstance(val):
    return val[0]
  else:
    return val

df = pd.DataFrame.from_dict([{k:flat_data(v) for (k,v) in document(['fields'].items()} 
                            for document in results])

ElasticSearch query to pandas dataframe

3 Answers3

Linked