1

I am using Elasticsearch 6.1 API for Python and I am trying to read a certain value from every single document in the database (303 958 documents).

doc = {
    'size' : 1000,
    'query' : {
        'match_all' : {}
    }
}

samplesCount = 0

res = es.search(index="index", doc_type='data', body=doc, scroll='1m')
scrollId = res['_scroll_id']

scrollSize = res['hits']['total']

while scrollSize > 0 :
    for x in range (0, len(res['hits']['hits']) - 1) :
        name = res['hits']['hits'][x]['_source']['name']
        samplesCount += 1
        print(str(samplesCount) + '. ' + name)
        scrollSize -= 1

    res = es.scroll(scroll_id=scrollId, scroll='1m')

The indexing (samplesCount) ends at 303 654 and it seems like the es.scroll returns no results for the remaining documents (around 300, which is less then a scroll size).

What is also makes me curious is that it ends at 303 654 ... I would expect a round number (a multiple of 1000).

Any ideas ?

Thank you very much for any helpful tips.

Tomas Lukac
  • 1,923
  • 2
  • 19
  • 37

1 Answers1

1

Try replacing

range (0, len(res['hits']['hits']) - 1) 

with

range(0, len(res['hits']['hits']))

or (equivalently)

range(len(res['hits']['hits']))

From looking at the syntax and the numbers that you quote it looks like you are skipping 1 record per iteration of the while cycle.

Davide Fiocco
  • 5,350
  • 5
  • 35
  • 72
  • I feel so dumb ... I am kinda new to python and I didn't realize that python for x in range(y, z) cycle already runs until z-1 only ... Thank you so much – Tomas Lukac Jan 22 '18 at 20:51