1

I have a collection like this:

[
  { parent: 'a', d1: '1', d2: '2', d3: '3', w: 10 },
  { parent: 'a', d1: '1', d2: '2', d3: '3', w: 20 },
  { parent: 'a', d1: '1', d2: '2', d3: '3', w: 30 },
  { parent: 'a', d1: '1', d2: '2', d3: '3', w: 40 },
  { parent: 'a', d1: '1', d2: '2', d3: '3', w: 50 },
  { parent: 'a', d1: '1', d2: '2', d3: '3', w: 60 },
  { parent: 'b', d1: '1', d2: '2', d3: '3', w: 10 },
  { parent: 'b', d1: '1', d2: '2', d3: '3', w: 13 },
  { parent: 'b', d1: '1', d2: '2', d3: '3', w: 14 },
  { parent: 'b', d1: '1', d2: '2', d3: '3', w: 15 },
  { parent: 'c', d1: '1', d2: '2', d3: '3', w: 10 },
  { parent: 'c', d1: '1', d2: '2', d3: '3', w: 100 },
  { parent: 'c', d1: '1', d2: '2', d3: '3', w: 200 },
  { parent: 'c', d1: '1', d2: '2', d3: '3', w: 300 }
]

Given a query with relevant parent ids, ['b','c'], I need to get back the first 3 results for each parent, hopefully DESC-sorted by w:

[
  { parent: 'b', d1: '1', d2: '2', d3: '3', w: 15 },
  { parent: 'b', d1: '1', d2: '2', d3: '3', w: 14 },
  { parent: 'b', d1: '1', d2: '2', d3: '3', w: 13 },
  { parent: 'c', d1: '1', d2: '2', d3: '3', w: 300 },
  { parent: 'c', d1: '1', d2: '2', d3: '3', w: 200 },
  { parent: 'c', d1: '1', d2: '2', d3: '3', w: 100 }
]

Using .find() and .limit() would return the first N results overall, not the first N for each parent. Using .aggregate() I figured out how to aggregate by parent but I couldn't figure out how to $limit by parent, nor how to return the entire documents as {parent: 'b', items: [{..}, {..}] } instead of just the group data. I can get either parent, which I already had, or maybe parent and an array on some field using $push, but that's still no good.

Lastly I also tried .mapReduce but that seems like overkill, wouldn't I have to emit(this.project, this); for the aggregation part? how would I even $limit on that? by hand? It's quite underdocumented.

Anyways, some direction on which way to go would be great here. I'm using mongoose@latest.

bevacqua
  • 47,502
  • 56
  • 171
  • 285

2 Answers2

1

As pointed out, this unfortunately, this cannot be achieved using the aggregation framework of MongoDB that exists currently, and rightly as you mentioned, map-reduce would be a overkill.

But there are alternate approaches to it:

Approach A:

  • Maintain a variable denoting the level of hierarchy based on the w field, or the field by which you want to sort the result set. Once you add the variable to each document during insertion.
  • Your documents would contain a new field called level which holds an array of a single value. We will discuss, why this needs to be an array and not a simple field.

Insert scripts:

db.collection.insert([
  { parent: 'a', d1: '1', d2: '2', d3: '3', w: 10,level:[6] },
  { parent: 'a', d1: '1', d2: '2', d3: '3', w: 20,level:[5] },
  { parent: 'a', d1: '1', d2: '2', d3: '3', w: 30,level:[4] },
  { parent: 'a', d1: '1', d2: '2', d3: '3', w: 40,level:[3] },
  { parent: 'a', d1: '1', d2: '2', d3: '3', w: 50,level:[2] },
  { parent: 'a', d1: '1', d2: '2', d3: '3', w: 60,level:[1] },
  { parent: 'b', d1: '1', d2: '2', d3: '3', w: 10,level:[4] },
  { parent: 'b', d1: '1', d2: '2', d3: '3', w: 13,level:[3] },
  { parent: 'b', d1: '1', d2: '2', d3: '3', w: 14,level:[2] },
  { parent: 'b', d1: '1', d2: '2', d3: '3', w: 15,level:[1] },
  { parent: 'c', d1: '1', d2: '2', d3: '3', w: 10,level:[4] },
  { parent: 'c', d1: '1', d2: '2', d3: '3', w: 100,level:[3] },
  { parent: 'c', d1: '1', d2: '2', d3: '3', w: 200,level:[2] },
  { parent: 'c', d1: '1', d2: '2', d3: '3', w: 300,level:[1] }
])

Let's assume you want to get the top 3 results based on the sort order of the w field for each parent. The you could easily aggregate as below:

var levels = [1,2,3];  // indicating the records in the range that we need to pick up,
                       // from each parent. 
  • Match all the parents that are a or b.
  • Sort the records by the w field.
  • Group by the parent. Once you group, all the documents for a parent become sub documents of the grouped record, hence allowing you to apply the $redact stage.
  • Now apply the $redact stage to redact those sub documents, whose level is not a sub set of the levels that we seek. We kept level as an array because it makes it easier to apply the $setIsSubset operator on it. Else we would have required $in, which is not supported inside a $cond expression.

Code:

Model.aggregate(
{$match:{"parent":{$in:["a","b"]}}},
{$sort:{"w":-1}},
{$group:{"_id":"$parent",
         "rec":{$push:"$$ROOT"}}},
{$redact:{$cond:[{$setIsSubset:[{$ifNull:["$levels",[1]]},
                               inp]},
                 "$$DESCEND","$$PRUNE"]}},
,function(err,resp){
 // handle response
})

The obtained output is perfect, as we wanted:(Showing just b group, for keeping it shorter)

{
        "_id" : "b",
        "rec" : [
                {
                        "_id" : ObjectId("54b030a3e4eae97f395e5e89"),
                        "parent" : "b",
                        "d1" : "1",
                        "d2" : "2",
                        "d3" : "3",
                        "w" : 15,
                        "level" : [
                                1
                        ]
                },
                {
                        "_id" : ObjectId("54b030a3e4eae97f395e5e88"),
                        "parent" : "b",
                        "d1" : "1",
                        "d2" : "2",
                        "d3" : "3",
                        "w" : 14,
                        "level" : [
                                2
                        ]
                },
                {
                        "_id" : ObjectId("54b030a3e4eae97f395e5e87"),
                        "parent" : "b",
                        "d1" : "1",
                        "d2" : "2",
                        "d3" : "3",
                        "w" : 13,
                        "level" : [
                                3
                        ]
                }
        ]
}

Approach B:

The redaction of the sub documents is done on the client side:

var result = db.collection.aggregate([
{$match:{"parent":{$in:["a","b"]}}},
{$sort:{"w":-1}},
{$group:{"_id":"$parent","rec":{$push:"$$ROOT"}}}
]).map(function(doc){
    doc.rec.splice(0,3);
    return doc;
})

Which is fairly slower, since all the records for each parent would be returned by MongoDB. The choice is yours, depending upon what suits your application.

BatScream
  • 19,260
  • 4
  • 52
  • 68
  • `w` is actually a date, not just a hierarchy level – bevacqua Jan 09 '15 at 20:30
  • It doesnt matter. 'level' is just a field which helps identify the top records. When you insert a document you need to make sure it is set according to the field you would want to sort on. – BatScream Jan 09 '15 at 20:33
0

After reading this answer to a similar question I decided to go down that path, and I wrote a module that builds the aggregate query for you with a certain degree of flexibility.

Example code based on my initial question:

var _ = require('lodash');
var limited = require('limited');
var D = require('./models/D');

function getLastDsByParent (ids, done) {
  var options = {
    model: D,
    field: 'parent',
    query: { parent : { $in: ids } },
    limit: 3,
    sort: { w: -1 }
  };
  limited(options, find);

  function find (err, result) {
    if (err) {
      done(err); return;
    }

    D
      .find({ _id: { $in: _.flatten(result, 'documents') } })
      .lean()
      .exec(done);
  }
}
Community
  • 1
  • 1
bevacqua
  • 47,502
  • 56
  • 171
  • 285
  • This approach doesn't slice off the top `n` records for a parent, on the database server, and the linked answer has too many stages and involves a lot of overhead which could result in very low performance if your data set is large. If you would want top `n` records, you would end up doing `n` group and `n` unwind stages, leave alone the number of in between project stages. – BatScream Jan 09 '15 at 20:57
  • It does slice the top `n` records, it's basically the linked answer made into a module. I am concerned about the overhead, for sure. I think I should measure that – bevacqua Jan 09 '15 at 20:57
  • If I went with your first approach I'd have to modify every single document for `parent` on every insert. – bevacqua Jan 09 '15 at 21:03
  • Yes i agree, you would require a bulk update. So you need to choose the best solution considering your data set and application code(which would require some amount of rework) if you go by the first approach. – BatScream Jan 09 '15 at 21:05