0

I am using elastic search as a database which has millions of records. I am using the below code to retrieve the data but it is not giving me complete data.

response = requests.get(http://localhost:9200/cityindex/_search?q=:&size=10000)

This is giving me only 10000 records.

when I am extending the size to the size of doc count(which is 784234) it's throwing an error.

'Result window is too large, from + size must be less than or equal to: [10000] but was [100000]. See the scroll API for a more efficient way to request large data sets. This limit can be set by changing the [index.max_result_window] index level setting.'}]

Context what I want to do. I want to extract all the data of a particular index and then do the analysis on that(I am looking to get the whole data in JSON format). I am using python for my project. Can someone please help me with this?

Redox
  • 9,321
  • 5
  • 9
  • 26
  • You should use elasticdump for a task like this: https://stackoverflow.com/questions/34921637/how-to-copy-one-index-documents-to-other-index-in-elasticsearch/34922623#34922623 – Val Jun 14 '22 at 13:35
  • 1
    Does this answer your question? [How do I retrieve more than 10000 results/events in Elasticsearch?](https://stackoverflow.com/questions/41655913/how-do-i-retrieve-more-than-10000-results-events-in-elasticsearch) – Sagar Patel Jun 14 '22 at 13:36

1 Answers1

0

You need to scroll over pages ES returns to you and store them into a list/array. You can use elastic search library for the same example python code

from elasticsearch import Elasticsearch
es = Elasticsearch(hosts="localhost", port=9200, timeout=30)

page = es.search(
    index = 'index_name',
    scroll = '5m',
    search_type = 'scan',
    size = 5000)

sid = page['_scroll_id']
scroll_size = page['hits']['total']
print scroll_size
records = []
while (scroll_size > 0):
    print "Scrolling..."
    page = es.scroll(scroll_id = sid, scroll = '2m')
    # Update the scroll ID
    sid = page['_scroll_id']
    # Get the number of results that we returned in the last scroll
    scroll_size = len(page['hits']['hits'])
    for rec in page['hits']['hits']:
        ele = rec['_source']
        records.append(ele)