3

I have been looking for a solution all morning but could not really find anything I could use in a production environment.

The final goal is, given a mongo collection, get the N oldest messages, and apply an update to them. This is not easily doable, things like

db.coll.update({}, {$set: {status: 'UPDATED'}}).limit(N)

do not work.

So keeping in mind I want to update these N documents (or the whole collection if it has less than N documents), I was thinking about sorting the documents by creation date, getting the Nth one, and updating all documents $lte _id(N). (with y beautiful pseudo-code).

Thing is, I can't seem to be able to find an efficient way to do this. First I tried stuff like:

db.coll.find().sort(_id: 1).limit(N).sort(_id: -1).limit(1)

to then realize that having two sort in the same command was useless...

This works:

db.coll.find().limit(N).skip(N-1).next()

but has two big drawbacks:

  • I have to be sure beforehand that I have at least N documents (which is not that big an issue in my case but...)
  • .skip() is known to be CPU intensive because it actually go through the whole cursor. Although in my case N should not be greater than 1M, still not a good thing, when what I want is really the last document of the cursor.

So I guess my question is, assuming me trying to use the creation date to do this is the right way, how can I either:

  • get the Nth document inserted in my collection (under some criteria)
  • how to get the last record of my cursor (db.coll.find().limit(N))

Thank you a lot!! Alexis

PS: If that makes any difference, we are actually coding in Java for this.

Xælias
  • 848
  • 5
  • 18
  • possible duplicate of [How to limit number of updating documents in mongodb](http://stackoverflow.com/questions/6590890/how-to-limit-number-of-updating-documents-in-mongodb) – Pio Jun 01 '15 at 16:16
  • Yes and no. I don't want to have my million requests and go over them manually as the solution does. The number we are talking about are too big for them to be really gracefully handled my the Java driver :/ If that's the only solution anybody as to offer, I will try it, but getting the last element of my cursor should not be that hard right? – Xælias Jun 01 '15 at 16:23
  • 1
    There is another question that is closer to your question: http://stackoverflow.com/questions/12572485/php-mongodb-find-nth-entry-in-collection. This solution basically suggests to use an auto-incrementing field for your date and index it, then you can update the n-th date there. Of course the usefulness of this approach depends on how do you collect your data (i.e. are you collecting it linearly or not). – Pio Jun 01 '15 at 17:08
  • It feels stupid to add a field when the id already contains a date, and thus an order we should be able to use :/ – Xælias Jun 01 '15 at 18:30

1 Answers1

0

Using map-reduce you can achieve almost the desired result.

The map and reduce functions are trivial:

map = function() {
  this.value.status = 'UPDATED';
  emit(this._id, this.value)
}

reduce = function(key, values) {
  // XXX should log an error if we reach that point
  return {unexpectedReduce: values}
}

The trick is to use the merge output action of mapReduce (as well as limit, sort and query to select only the required input documents):

db.test.mapReduce(map, reduce,
                  { query: {"value.status": {$ne: 'UPDATED'}},
                    sort: { _id: 1 },
                    limit: 10,
                    out: {merge: 'test'},
                  }
)

But, there is a but: you have to store the document in you collection as {_id: ... , value: { field1: ..., field2: ..., ... }} as this is the only output format currently supported by mapReduce jobs.


Here is a sample test collection I used when writing this answer:

> for(i = 0; i < 100; ++i) {
    db.test.insert({value:{field1: i, field2: "hello"+i}}); sleep(500);
  }

(BTW, note that I use ObjectID to identify older documents as the 4 most significant bytes are seconds since the Unix epoch)

Running the above map-reduce jobs will update the collection by batch of 10 older non-updated records:

> db.test.mapReduce(map, reduce,
                  { query: {"value.status": {$ne: 'UPDATED'}},
                    sort: { _id: 1 },
                    limit: 10,
                    out: {merge: 'test'},
                  }
)
> db.test.find()
{ "_id" : ObjectId("556cd4d00027c9fdf8af809f"), "value" : { "field1" : 96, "field2" : "hello96" } }
{ "_id" : ObjectId("556cd4d00027c9fdf8af80a0"), "value" : { "field1" : 97, "field2" : "hello97" } }
{ "_id" : ObjectId("556cd4d10027c9fdf8af80a1"), "value" : { "field1" : 98, "field2" : "hello98" } }
{ "_id" : ObjectId("556cd4d10027c9fdf8af80a2"), "value" : { "field1" : 99, "field2" : "hello99" } }
{ "_id" : ObjectId("556cd49f0027c9fdf8af803f"), "value" : { "field1" : 0, "field2" : "hello0", "status" : "UPDATED" } }
{ "_id" : ObjectId("556cd4a00027c9fdf8af8040"), "value" : { "field1" : 1, "field2" : "hello1", "status" : "UPDATED" } }
{ "_id" : ObjectId("556cd4a00027c9fdf8af8041"), "value" : { "field1" : 2, "field2" : "hello2", "status" : "UPDATED" } }
{ "_id" : ObjectId("556cd4a10027c9fdf8af8042"), "value" : { "field1" : 3, "field2" : "hello3", "status" : "UPDATED" } }
{ "_id" : ObjectId("556cd4a10027c9fdf8af8043"), "value" : { "field1" : 4, "field2" : "hello4", "status" : "UPDATED" } }
{ "_id" : ObjectId("556cd4a20027c9fdf8af8044"), "value" : { "field1" : 5, "field2" : "hello5", "status" : "UPDATED" } }
{ "_id" : ObjectId("556cd4a20027c9fdf8af8045"), "value" : { "field1" : 6, "field2" : "hello6", "status" : "UPDATED" } }
{ "_id" : ObjectId("556cd4a30027c9fdf8af8046"), "value" : { "field1" : 7, "field2" : "hello7", "status" : "UPDATED" } }
{ "_id" : ObjectId("556cd4a30027c9fdf8af8047"), "value" : { "field1" : 8, "field2" : "hello8", "status" : "UPDATED" } }
{ "_id" : ObjectId("556cd4a40027c9fdf8af8048"), "value" : { "field1" : 9, "field2" : "hello9", "status" : "UPDATED" } }
{ "_id" : ObjectId("556cd4a40027c9fdf8af8049"), "value" : { "field1" : 10, "field2" : "hello10" } }
...

Scroll right to see the updated status in the above and below code blocks

And running the same mapReduce job again:

{ "_id" : ObjectId("556cd4d00027c9fdf8af809f"), "value" : { "field1" : 96, "field2" : "hello96" } }
{ "_id" : ObjectId("556cd4d00027c9fdf8af80a0"), "value" : { "field1" : 97, "field2" : "hello97" } }
{ "_id" : ObjectId("556cd4d10027c9fdf8af80a1"), "value" : { "field1" : 98, "field2" : "hello98" } }
{ "_id" : ObjectId("556cd4d10027c9fdf8af80a2"), "value" : { "field1" : 99, "field2" : "hello99" } }
{ "_id" : ObjectId("556cd49f0027c9fdf8af803f"), "value" : { "field1" : 0, "field2" : "hello0", "status" : "UPDATED" } }
{ "_id" : ObjectId("556cd4a00027c9fdf8af8040"), "value" : { "field1" : 1, "field2" : "hello1", "status" : "UPDATED" } }
{ "_id" : ObjectId("556cd4a00027c9fdf8af8041"), "value" : { "field1" : 2, "field2" : "hello2", "status" : "UPDATED" } }
{ "_id" : ObjectId("556cd4a10027c9fdf8af8042"), "value" : { "field1" : 3, "field2" : "hello3", "status" : "UPDATED" } }
{ "_id" : ObjectId("556cd4a10027c9fdf8af8043"), "value" : { "field1" : 4, "field2" : "hello4", "status" : "UPDATED" } }
{ "_id" : ObjectId("556cd4a20027c9fdf8af8044"), "value" : { "field1" : 5, "field2" : "hello5", "status" : "UPDATED" } }
{ "_id" : ObjectId("556cd4a20027c9fdf8af8045"), "value" : { "field1" : 6, "field2" : "hello6", "status" : "UPDATED" } }
{ "_id" : ObjectId("556cd4a30027c9fdf8af8046"), "value" : { "field1" : 7, "field2" : "hello7", "status" : "UPDATED" } }
{ "_id" : ObjectId("556cd4a30027c9fdf8af8047"), "value" : { "field1" : 8, "field2" : "hello8", "status" : "UPDATED" } }
{ "_id" : ObjectId("556cd4a40027c9fdf8af8048"), "value" : { "field1" : 9, "field2" : "hello9", "status" : "UPDATED" } }
{ "_id" : ObjectId("556cd4a40027c9fdf8af8049"), "value" : { "field1" : 10, "field2" : "hello10", "status" : "UPDATED" } }
{ "_id" : ObjectId("556cd4a50027c9fdf8af804a"), "value" : { "field1" : 11, "field2" : "hello11", "status" : "UPDATED" } }
{ "_id" : ObjectId("556cd4a50027c9fdf8af804b"), "value" : { "field1" : 12, "field2" : "hello12", "status" : "UPDATED" } }
{ "_id" : ObjectId("556cd4a60027c9fdf8af804c"), "value" : { "field1" : 13, "field2" : "hello13", "status" : "UPDATED" } }
{ "_id" : ObjectId("556cd4a60027c9fdf8af804d"), "value" : { "field1" : 14, "field2" : "hello14", "status" : "UPDATED" } }
{ "_id" : ObjectId("556cd4a70027c9fdf8af804e"), "value" : { "field1" : 15, "field2" : "hello15", "status" : "UPDATED" } }
{ "_id" : ObjectId("556cd4a70027c9fdf8af804f"), "value" : { "field1" : 16, "field2" : "hello16", "status" : "UPDATED" } }
{ "_id" : ObjectId("556cd4a80027c9fdf8af8050"), "value" : { "field1" : 17, "field2" : "hello17", "status" : "UPDATED" } }
{ "_id" : ObjectId("556cd4a80027c9fdf8af8051"), "value" : { "field1" : 18, "field2" : "hello18", "status" : "UPDATED" } }
{ "_id" : ObjectId("556cd4a90027c9fdf8af8052"), "value" : { "field1" : 19, "field2" : "hello19", "status" : "UPDATED" } }
{ "_id" : ObjectId("556cd4a90027c9fdf8af8053"), "value" : { "field1" : 20, "field2" : "hello20" } }
{ "_id" : ObjectId("556cd4aa0027c9fdf8af8054"), "value" : { "field1" : 21, "field2" : "hello21" } }
...
Sylvain Leroux
  • 50,096
  • 7
  • 103
  • 125