1

Here is the dataset

// Data 1
{ name : 111, 
  factors : [
    {name:"f1", value:"dog", unit : "kg"},
    {name:"f2", value:"0"}
  ]
},// data2
{ name : 112, 
  factors : 
  [
    {name:"f1", value:"cat", unit : "g"},
    {name:"f2", value:"13"}
  ]
}
// 100,000 more data ...

I would like to convert the value of factor f2 to be number.

db.getCollection('cases').find({ 
        factors : {
            $elemMatch : {
                 name : "f2",
                 value  : {$type : 2}
            }
        }
}).forEach(function(doc, i){
        doc.factors.forEach(function(factor){
            if(factor.name == "f2"){
               factor.value = !isNaN(factor.value) ? parseInt(factor.value) : factor.value;
            }
        });
        db.cases.save(factor);
});

However, it can only update about 75~77 data for each execution. I am not sure why and I guess the problem is that the save() is async, so we can not initiate too many save() at the same time.

What should I do?

ppn029012
  • 580
  • 1
  • 6
  • 20
  • See also related post: [How To Convert String to Numerical Values in MongoDB](https://stackoverflow.com/questions/29487351/how-to-convert-string-to-numerical-values-in-mongodb) – Wan B. Jul 27 '18 at 12:22

1 Answers1

3

The concept here is to loop through your collection with a cursor and for each document within the cursor, gather data about the index position of the factors array elements. You will then use this data later on in the loop as the update operation parameters to correctly identify the desired field to update.

Supposing your collection is not that humongous, the intuition above can be implemented using the forEach() method of the cursor as you have done in your attempts to do the iteration and getting the index data for all the arrays involved. The following demonstrates this approach for small datasets:

db.cases.find({"factors.value": { "$exists": true, "$type": 2 }}).forEach(function(doc){ 
    var factors = doc.factors,
        updateOperatorDocument = {}; 
    for (var idx = 0; idx < factors.length; idx++){ 
        var val;
        if(factors[idx].name == "f2"){
            val = !isNaN(factors[idx].value) ? parseInt(factors[idx].value) : factors[idx].value;
            updateOperatorDocument["factors."+ idx +".value"] = val;                
        }           
    };
    db.cases.updateOne(
        { "_id": doc._id },
        { "$set": updateOperatorDocument }
    );
});

Now for improved performance especially when dealing with large collections, take advantage of using a Bulk() API for updating the collection in bulk. This is quite effecient as opposed to the above operations because with the bulp API you will be sending the operations to the server in batches (for example, say a batch size of 1000) which gives you much better performance since you won't be sending every request to the server but just once in every 1000 requests, thus making your updates more efficient and quicker.

The following examples demonstrate using the Bulk() API available in MongoDB versions >= 2.6 and < 3.2.

var bulkUpdateOps = db.cases.initializeUnOrderedBulkOp(),   
    counter = 0;

db.cases.find({"factors.value": { "$exists": true, "$type": 2 }}).forEach(function(doc){ 
    var factors = doc.factors,
        updateOperatorDocument = {}; 
    for (var idx = 0; idx < factors.length; idx++){ 
        var val;
        if(factors[idx].name == "f2"){
            val = !isNaN(factors[idx].value) ? parseInt(factors[idx].value) : factors[idx].value;
            updateOperatorDocument["factors."+ idx +".value"] = val;                
        }           
    };
    bulkUpdateOps.find({ "_id": doc._id }).update({ "$set": updateOperatorDocument })

    counter++;  // increment counter for batch limit
    if (counter % 1000 == 0) { 
        // execute the bulk update operation in batches of 1000
        bulkUpdateOps.execute(); 
        // Re-initialize the bulk update operations object
        bulkUpdateOps = db.cases.initializeUnOrderedBulkOp();
    } 
})

// Clean up remaining operation in the queue
if (counter % 1000 != 0) { bulkUpdateOps.execute(); }

The next example applies to the new MongoDB version 3.2 which has since deprecated the Bulk() API and provided a newer set of apis using bulkWrite().

It uses the same cursors as above but creates the arrays with the bulk operations using the same forEach() cursor method to push each bulk write document to the array. Because write commands can accept no more than 1000 operations, you will need to group your operations to have at most 1000 operations and re-intialise the array when loop hit the 1000 iteration:

var cursor = db.cases.find({"factors.value": { "$exists": true, "$type": 2 }}),
    bulkUpdateOps = [];

cursor.forEach(function(doc){ 
    var factors = doc.factors,
        updateOperatorDocument = {}; 
    for (var idx = 0; idx < factors.length; idx++){ 
        var val;
        if(factors[idx].name == "f2"){
            val = !isNaN(factors[idx].value) ? parseInt(factors[idx].value) : factors[idx].value;
            updateOperatorDocument["factors."+ idx +".value"] = val;                
        }           
    };
    bulkUpdateOps.push({ 
        "updateOne": {
            "filter": { "_id": doc._id },
            "update": { "$set": updateOperatorDocument }
         }
    });

    if (bulkUpdateOps.length == 1000) {
        db.cases.bulkWrite(bulkUpdateOps);
        bulkUpdateOps = [];
    }
});         

if (bulkUpdateOps.length > 0) { db.cases.bulkWrite(bulkUpdateOps); }

Write Result for Sample data

{
    "acknowledged" : true,
    "deletedCount" : 0,
    "insertedCount" : 0,
    "matchedCount" : 2,
    "upsertedCount" : 0,
    "insertedIds" : {},
    "upsertedIds" : {}
}
chridam
  • 100,957
  • 23
  • 236
  • 235
  • 1
    Your answer is missing something. You also need to *execute the bulk update operation in batches of 1000* in your forEach loop in 3.2 then re-init the array. – styvane May 19 '16 at 10:01
  • 1
    You are right, though the general case here is [mongo shell operations do not have this limit](https://docs.mongodb.com/manual/reference/limits/#Write-Command-Operation-Limit-Size). – chridam May 19 '16 at 10:11