4

I have been using Mongoose to insert a large amount of data into a mongodb database. I noticed that by default, Mongoose adds _id fields to all subdocuments, leaving me with documents which look like this (I've removed many fields for brevity - I've also shrunken each array to one entry, they generally have more)

{
    "start_time" : ISODate("2013-04-05T02:30:28Z"),
    "match_id" : 165816931,
    "players" : [
            {
                    "account_id" : 4294967295,
                    "_id" : ObjectId("51daffdaa78cee5c36e29fba"),
                    "additional_units" : [ ],
                    "ability_upgrades" : [
                            {
                                    "ability" : 5155,
                                    "time" : 141,
                                    "level" : 1,
                                    "_id" : ObjectId("51daffdaa78cee5c36e29fca")
                            },
                    ]
            },
    ],
     "_id" : ObjectId("51daffdca78cee5c36e2a02e")
}

I have found how to prevent Mongoose adding these by default (http://mongoosejs.com/docs/guide.html, see option: id), however I now have 95 million records with these extraneous _id fields on all subdocuments. I am interested in finding the best way of deleting all of these fields (leaving the _id on the top level document). My initial thoughts are to use a bunch of for...in loops on each object but this seems very inefficient.

Charles A
  • 404
  • 3
  • 10

4 Answers4

3

Given Derick's answer, I have created a function to do this:

var deleteIdFromSubdocs = function (obj, isRoot) {
for (var key in obj) {
    if (isRoot == false && key == "_id") {
        delete obj[key];
    } else if (typeof obj[key] == "object") {
        deleteIdFromSubdocs(obj[key], false);
    }
}
return obj;

And run it against a test collection using:

 db.testobjects.find().forEach(function (x){ y = deleteIdFromSubdocs(x, true); db.testobjects.save(y); } )

This appears to work for my test collection. I'd like to see if anyone has any opinions on how this could be done better/any risks involved before I run it against the 95 million document collection.

Charles A
  • 404
  • 3
  • 10
  • I'm accepting my own answer here as it is slightly more generic than the one provided by @Miguel Cartagena and the performance seems to be roughly the same (within an order of magnitude). – Charles A Jul 26 '13 at 09:38
2

The players._id could be removed using an update operation, like he following:

db.collection.update({'players._id': {$exists : 1}}, { $unset : { 'players.$._id' : 1 } }, false, true)

However, it's not possible use positional operator in nested arrays. So, one solution is run a script directly on our database:

var cursor = db.collection.find({'players.ability_upgrades._id': {$exists : 1}});

cursor.forEach(function(doc) {

    for (var i = 0; i < doc.players.length; i++) {
        var player = doc.players[i];
        delete player['_id'];

        for (var j = 0; j < player.ability_upgrades.length; j++) {
            delete player.ability_upgrades[j]['_id'];
        }
    }

    db.collection.save(doc);
});

Save the script to a file and call mongo with the file as parameter:

> mongo remove_oid.js --shell
Community
  • 1
  • 1
Miguel Cartagena
  • 2,576
  • 17
  • 20
0

The only solution is to do this one by one, exactly with a for...in loop as you described.

Derick
  • 35,169
  • 5
  • 76
  • 99
0

Just another version, try this with AngularJS and MongoDB ;-)

function removeIds (obj, isRoot) {
    for (var key in obj._doc) {
        if (isRoot == false && key == "_id") {
            delete obj._doc._id;
        } else if ((Object.prototype.toString.call( obj[key] ) ===  '[object Array]' )) {
            for (var i=0; i<obj[key].length; i++)
                removeIds(obj[key][i], false);
        }
    }
    return obj;
}

Usage:

var newObj = removeIds(oldObj, true);
delete newObj._id;
MCurbelo
  • 4,097
  • 2
  • 26
  • 21