I have 75 million records in my MongoDB
. I need to read the whole data in batches (of say 100,000), store it in some sort of stream / queue. Once the stream has data, a Python script will read from it and operate upon the data.
Basically I want the results of collection.find()
to be served in batches of 100,000.
I know I can do collection.find()[0:100000]
and then collection.find()[100000:200000]
and so on as seen - here
But I'm worried about the efficiency of running 'skip' every time.
I'm aware of cursor.batchSize()
but I'm not sure how to use that to sequentially keep reading data.
Is there any better way to do this?