4

I am using a mongoskin in my nodeJs applicatipon to insert data in mongo db. I have a requirement to insert array of documents in database and send back the Ids of inserted records to the client. I am able to insert data however unable to locate the Ids of inserted records in the result Object. Need help to locate the insertedIds in the result. Im using the below code to bulk insert.

db.collection('myCollection', function (err, collection) {
    var bulk = collection.initializeUnorderedBulkOp();
    for (var i = 0; i < dataArray.length; i++) {
        bulk.insert(dataArray[i]);
    }

    bulk.execute(function (err, result) {
      //TODO: return the Ids of inserted records to the client
      //Client will use these Ids to perform subsequent calls to the nodejs service
    });
});

My result is a BatchWriteResult Object type.

WhiteV
  • 5
  • 2
Kavya Mugali
  • 1,008
  • 2
  • 10
  • 17

1 Answers1

2

Would suggest using the other bulk API method upsert() which will afford you to get in your BatchWriteResult() object the _id values of the inserted documents by calling its getUpsertedIds() method. The result object is in the same format as given in the documentation for BulkWriteResult.

The update operation with the Bulk.find.upsert() option will perform an insert when there are no matching documents for the Bulk.find() condition. If the update document does not specify an _id field, MongoDB adds the _id field and thus you can retrieve the id's of the inserted document within your BatchWriteResult().

Also, the way you are queing up your bulk insert operations is not usually recommened since this basically builds up in memory; you'd want to have a bit of more control with managing the queues and memory resources other than relying on the driver's default way of limiting the batches of 1000 at a time, as well as the complete batch being under 16MB. The way you can do this is to use the forEach() loop of your data array with a counter that will help limit the batches to 1000 at a time.


The following shows the above approach

function getInsertedIds(result){
    var ids = result.getUpsertedIds();
    console.log(ids); // an array of upserted ids
    return ids;
}

db.collection('myCollection',function(err,collection) {
    var bulk = collection.initializeUnorderedBulkOp(),
        insertedIds = [],
        counter = 0;

    dataArray.forEach(function (data){
        bulk.find(data).upsert().updateOne(data);
        counter++;

        if (counter % 1000 == 0) {
            bulk.execute(function(err, result) {
               insertedIds = getInsertedIds(result);
               bulk = collection.initializeUnorderedBulkOp(); // reset after execute
            });      
        }
    });

    // Clean up the remaining operations in the queue which were 
    // cut off in the loop - counter not a round divisor of 1000
    if (counter % 1000 != 0 ) {
        bulk.execute(function(err, result) {
            insertedIds = insertedIds.concat(getInsertedIds(result));
            console.log(insertedIds);
        });
    }
});
chridam
  • 100,957
  • 23
  • 236
  • 235
  • 2
    that means upsert will return ids in newly created documents and insert not? - if yes that is very clever. I like it! – profesor79 Jun 27 '16 at 08:45
  • If using the latest Node.js driver then that would be possible with just the `Bulk.insert()` method using the [**`getInsertedIds()`**](http://mongodb.github.io/node-mongodb-native/2.1/api/BulkWriteResult.html#getInsertedIds) method, however because it's returning the [**`BatchWriteResult`**](https://mongodb.github.io/node-mongodb-native/api-generated/batchwriteresult.html), the workaround I know would be to go via the `upsert()` way and use the [**`getUpsertedIds()`**](https://mongodb.github.io/node-mongodb-native/api-generated/batchwriteresult.html#getupsertedids) method. – chridam Jun 27 '16 at 08:51
  • @chridam : Thanks for the explanation and code. This works but only for the first run. If i want to insert the same document again, the document gets updated instead of insert. My requirement is dataArray should get inserted everytime irrespective of whether data exists or not. i tried to force insert by 1. `bulk.find({}).upsert().updateOne(data);` - it dint work. 2.`bulk.find({food:'bar'}).upsert().updateOne(data)` - I pass object which can never occur in my collection. This is extremely slow. got hung with 20000 records in array. – Kavya Mugali Jun 27 '16 at 09:57
  • when api to insert records is invoked, job of my service is to simply insert the array sent in the request. If the same array is passed twice, service should blindly insert the array again. Ex: lets say Client invokes the insert api with array of 10 records. I insert it. After sometime client again invokes the api with same 10 records(offcouse there is no _id field here). I should insert it again. At the end of 2 executions, there should be 20 records. The code you provided results in only 10 records after 2 executions. – Kavya Mugali Jun 27 '16 at 11:38
  • @chridam can we get the ids of updated documents too instead of only inserted ones? I'm using pymongo. – y_159 Sep 06 '21 at 02:54