3

I have a query which looks like:

function getPage(page) {
  return db.messages.aggregate(
    {
      '$group': {
        _id: "$subjectID"
      }
    },
    { '$skip': page * 20 },
    { '$limit' : 20 });
}

Say I have a subjectID that I know appears somewhere in that collection. What I want to do is write something like:

function pageOf(subjectID) {
  return Math.floor(db.messages.aggregate(
    {
      '$group': {
        _id: "$subjectID"
      }
    }).indexOf({__id: subjectID}) / 20);
}

Except I have no idea how to write the indexOf part of that query. I was wondering if mongodb might have some sort of "take while" or "take until" query, then you could do that followed by a count of the number of items.

ForbesLindesay
  • 10,482
  • 3
  • 47
  • 74

1 Answers1

4

Ordering by subjectID

If your subjectID is (or can be changed to) a monotonically increasing value (for example, a MongoDB default ObjectID), you have a straightforward option using a normal find() with appropriate sort, skip, and limit. In this case you can look for documents with subjectIDs $gte (greater than or equal to) your subjectID:

var page = 1;
var subjectID = ObjectId("515535a0760fe8735f5f6897");
db.users.find(
    { _id: { $gte : subjectID } }
).sort({'_id':1}).skip(page*20).limit(20)

Aggregation Framework

As at MongoDb 2.4, there is no such feature in the Aggregation Framework to match based on the document position in the result pipeline. You could file a new feature suggestion the MongoDB Jira project's SERVER queue.

It sounds like you would want a new pipeline operator such as a $matchfrom which would ignore any documents until the first occurrence of the $matchfrom criteria. You could then add a $limit to take the next n items. You would also want to have sorted output before the $matchfrom so there is a predictable outcome.

This seems overcomplicated compared to having an increasing subjectID, but there may be a use case for doing paging based on more advanced search criteria or results calculated in the aggregation pipeline.

Alternative approaches

Aside from future support for such a feature in the Aggregation Framework, you have a few options to implement the same matching approach in code:

  • use the older group() aggregation command with a finalize() function. NOTE: group() does not work with sharded clusters.

  • use MapReduce and a finalize() function

  • fetch the whole result set from the Aggregation Framework, and implement the matching/reduction of results in your application code (though this somewhat defeats the "paging" notion if you are fetching all pages for every request).

Performance considerations

Queries with skip still have to read through the intervening index entries, so skipping a large number of documents will not be very efficient.

Instead of paging with a skip offset, you could consider doing successive page queries by starting from the last entry of the previous page (i.e. the first page would be $gte the starting subjectID and subsequent pages would be $gt the last subjectID included on the previous page). This will depend on how you present the paging in your user interface - it would be easiest to use this approach if your UI only has the option to show "next" page of messages rather than jumping to a specific page.

Stennie
  • 63,885
  • 14
  • 149
  • 175
  • The ordering thing is a really good idea. Unfortunately I can't use gt/gte for my pagination because my page urls are of the form `/` and I want to stick with that. I have about 26000 items and use a `$group` operation to produce about 20000 groups. I'm adding about 15-20 items per day and it's still performing fine at the moment. I'm making a fair bit of use of caching and query times of up to about 1 second are still acceptable. – ForbesLindesay Apr 14 '13 at 10:00
  • *to produce about 2000 groups – ForbesLindesay Apr 14 '13 at 12:24