I have a mongo collection with few million records, I need to iterate them for preprocessing and storing in a seperate collection. Would it be better to use a loop to iterate over a cursor limitted to small chunks or to iterate over a cursor without a limit defined.
from pymongo import MongoClient
from other_file import process
mc = MongoClient()
collection_obj = mc.mydb.mycoll
# method 1
cursor1 = collection_obj.find({})
for each_ele in cursor1:
process(each_ele)
# method 2
for i in range(0, total_length, 5000):
cursor2 = collection_obj.find({}).limit(5000).skip(i)
for each in cursor2:
process(each)
Can I know which of these would be better for large data sets.