15

I have a collection of about 1 million records with 20 fields each. I need to update integer flag field in every record (document) assigning randomly 1 or 2 to this flag field. How to do this while iterating cursor over the complete collection? It does not seem to be a good idea to search second time for object already found by MongoDB just to be able to update it:

  DBCursor cursor = coll.find();
  try {
     while(cursor.hasNext()) {
    BasicDBObject obj = (BasicDBObject) cursor.next();
    ...
    coll.update(query,newObj)

     }
  } finally {
     cursor.close();
  }

How to update a field in every document of a huge MongoDB collection with different values efficiently?

Anton Ashanin
  • 1,817
  • 5
  • 30
  • 43
  • You can update all documents (that match a specific condition) in a single query with the 'multi' flag in 'update' command set to true. Check this: http://stackoverflow.com/questions/4146452/mongodb-what-is-the-fastest-way-to-update-all-records-in-a-collection – Aafreen Sheikh Apr 12 '13 at 11:09
  • 1
    I can't use `multi` flag because I update every document with unique value. This is not the same thing as updating many documents with one and the same value. – Anton Ashanin Apr 12 '13 at 11:41
  • [https://stackoverflow.com/questions/4146452/mongodb-what-is-the-fastest-way-to-update-all-records-in-a-collection/50768815#50768815](https://stackoverflow.com/questions/4146452/mongodb-what-is-the-fastest-way-to-update-all-records-in-a-collection/50768815#50768815) I have answered there. Hope it helps. – shijin Jun 08 '18 at 21:56

2 Answers2

21

Your approach is basically correct. However I wouldn't consider such a collection as "huge" You can run something similar from the shell:

coll.find({}).forEach(function (doc) {
    doc.flag = Math.floor((Math.random()*2)+1);
    coll.save(doc);
 });

Depending on your MongoDB version, configuration and load, this may take something between few minutes to several hours

If you want to perform this update in bulks, use some conditions in your query document, something such as coll.find({"aFiled" : {$gt : minVal}, "aFiled" : {$lt : maxVal}})

Ori Dar
  • 18,687
  • 5
  • 58
  • 72
  • 1
    In my approach every document is searched by MongoDB twice. Does it make sense? – Anton Ashanin Apr 12 '13 at 11:39
  • My rectified function reduces it to a single cursor query, you don't make additional query per iteration. As you can see, I use `coll.save(doc)` – Ori Dar Apr 12 '13 at 11:42
  • 1
    This has issues: see http://docs.mongodb.org/manual/faq/developers/#faq-developers-isolate-cursors – Nashenas May 06 '15 at 18:35
  • Don't forget to add `noCursorTimeout()` if you're working with a huge collection! Otherwise the command will timeout after 10 minutes (at least it did for me). So the top line of Ori's answer becomes: `coll.find({}).noCursorTimeout().forEach(function (doc) {` –  May 09 '16 at 12:14
  • This is a javascript example. The OP asked for a Java one. – Amnon May 06 '22 at 12:13
  • This is a javascript example indeed. The OP has accepted it though. There you go. No need to be angry. – Ori Dar May 12 '22 at 18:24
4

My solution to my own question, inspired by @orid :

public void tagAll(int min, int max) {
    int rnd = 0;
    DBCursor cursor = this.dataColl.find();
    try {
        while (cursor.hasNext()) {
            BasicDBObject obj = (BasicDBObject) cursor.next();
            rnd = min + (int) (Math.random() * ((max - min) + 1));
            obj.put("tag", rnd);
            this.dataColl.save(obj);
        }
    } finally {
        cursor.close();
    }
}
Anton Ashanin
  • 1,817
  • 5
  • 30
  • 43