I'm having a much more difficult time than I thought I would importing multiple documents from Mongo into RAM in batch. I am writing an application to communicate with a MongoDB via pymongo
that currently has 2GBs, but in the near future could grow to over 1TB. Because of this, batch reading a limited number of records into RAM at a time is important for scalability.
Based on this post and this documentation I thought this would be about as easy as:
HOST = MongoClient(MONGO_CONN)
DB_CONN = HOST.database_name
collection = DB_CONN.collection_name
cursor = collection.find()
cursor.batch_size(1000)
next_1K_records_in_RAM = cursor.next()
This isn't working for me, however. Even though I have a Mongo collection populated with >200K BSON objects, this reads them in one at a time as single dictionaries, e.g. {_id : ID1, ...}
instead of what I'm looking for, which is an error of dictionaries representing multiple documents in my collections, e.g. [{_id : ID1, ...}, {_id : ID2, ...}, ..., {_id: ID1000, ...}]
.
I wouldn't expect this to matter, but I'm on python 3.5 instead of 2.7.
As this example references a secure, remote data source this isn't a reproducible example. Apologies for that. If you have a suggestion for how the question can be improved please let me know.