I'm trying to retrieve all the dataset from a collection in a mongoDB database to my local as a JSON File. My first attempt was using skip and limit (not to mention I need to know how big the data is before hand). But the execution time is very slow.
const dbInstance = await connectDB();
const collection = dbInstance.db(db).collection(schema);
console.time('retrievedAllDataset')
const _cursors = []
const batchSize = 50000;
for (let i = 0; i < 20000000; i += batchSize) {
_cursors.push(
collection.aggregate( query, {allowDiskUse: true}).skip(i).limit(batchSize)
);
console.log("batch number:", (i/batchSize))
}
Promise.all(_cursors).then((values) => {
console.time("perBatch")
// write to json here.
// do some other operation
console.timeEnd("perBatch")
}).finally(()=> {
console.timeEnd('retrievedAllDataset')
dbInstance.close()
});
Is there a better way to exhaust all the dataset so I don't need to know the number of data beforehand and "manually" using skip and limit to batch download it to my local or doing other operations in my node application?
Thanks,