0

I'm trying to retrieve all the dataset from a collection in a mongoDB database to my local as a JSON File. My first attempt was using skip and limit (not to mention I need to know how big the data is before hand). But the execution time is very slow.

    const dbInstance = await connectDB();
    const collection = dbInstance.db(db).collection(schema);

    console.time('retrievedAllDataset')
   
    const _cursors = []

    const batchSize = 50000;
    for (let i = 0; i < 20000000; i += batchSize) { 
      _cursors.push(
        collection.aggregate( query, {allowDiskUse: true}).skip(i).limit(batchSize)
      );

      console.log("batch number:", (i/batchSize))
    }

    Promise.all(_cursors).then((values) => {
      console.time("perBatch")
      // write to json here.  
      // do some other operation
      console.timeEnd("perBatch")
    }).finally(()=> {
      console.timeEnd('retrievedAllDataset')
      dbInstance.close()
    });

Is there a better way to exhaust all the dataset so I don't need to know the number of data beforehand and "manually" using skip and limit to batch download it to my local or doing other operations in my node application?

Thanks,

Saber Alex
  • 1,544
  • 3
  • 24
  • 39
  • Does this helps? https://stackoverflow.com/questions/8991292/dump-mongo-collection-into-json-format – Charchit Kapoor Sep 30 '22 at 07:12
  • use `mongoexport` cli or a mongo client like studio 3T – bmz1 Sep 30 '22 at 09:15
  • If the goal is to pull large sets of data into JSON files, then I agree that `mongoexport` is the way to go. Probably no reason to rewrite your own version. It includes a `--query` parameter that you may make use of. If instead you are going to stick with a custom solution in node, then consider setting a batch size (if needed) and simply iterating the cursor to exhaustion. No need to repeatedly query and cause the database to do extra work. – user20042973 Sep 30 '22 at 12:05

0 Answers0