Mongoose limit by association

Question

I have a collection like this:

[
  { parent: 'a', d1: '1', d2: '2', d3: '3', w: 10 },
  { parent: 'a', d1: '1', d2: '2', d3: '3', w: 20 },
  { parent: 'a', d1: '1', d2: '2', d3: '3', w: 30 },
  { parent: 'a', d1: '1', d2: '2', d3: '3', w: 40 },
  { parent: 'a', d1: '1', d2: '2', d3: '3', w: 50 },
  { parent: 'a', d1: '1', d2: '2', d3: '3', w: 60 },
  { parent: 'b', d1: '1', d2: '2', d3: '3', w: 10 },
  { parent: 'b', d1: '1', d2: '2', d3: '3', w: 13 },
  { parent: 'b', d1: '1', d2: '2', d3: '3', w: 14 },
  { parent: 'b', d1: '1', d2: '2', d3: '3', w: 15 },
  { parent: 'c', d1: '1', d2: '2', d3: '3', w: 10 },
  { parent: 'c', d1: '1', d2: '2', d3: '3', w: 100 },
  { parent: 'c', d1: '1', d2: '2', d3: '3', w: 200 },
  { parent: 'c', d1: '1', d2: '2', d3: '3', w: 300 }
]

Given a query with relevant parent ids, ['b','c'], I need to get back the first 3 results for each parent, hopefully DESC-sorted by w:

[
  { parent: 'b', d1: '1', d2: '2', d3: '3', w: 15 },
  { parent: 'b', d1: '1', d2: '2', d3: '3', w: 14 },
  { parent: 'b', d1: '1', d2: '2', d3: '3', w: 13 },
  { parent: 'c', d1: '1', d2: '2', d3: '3', w: 300 },
  { parent: 'c', d1: '1', d2: '2', d3: '3', w: 200 },
  { parent: 'c', d1: '1', d2: '2', d3: '3', w: 100 }
]

Using .find() and .limit() would return the first N results overall, not the first N for each parent. Using .aggregate() I figured out how to aggregate by parent but I couldn't figure out how to $limit by parent, nor how to return the entire documents as {parent: 'b', items: [{..}, {..}] } instead of just the group data. I can get either parent, which I already had, or maybe parent and an array on some field using $push, but that's still no good.

Lastly I also tried .mapReduce but that seems like overkill, wouldn't I have to emit(this.project, this); for the aggregation part? how would I even $limit on that? by hand? It's quite underdocumented.

Anyways, some direction on which way to go would be great here. I'm using mongoose@latest.

https://jira.mongodb.org/browse/SERVER-9377 – lante Jan 09 '15 at 18:03 — lante, Jan 09 '15 at 18:03

BatScream · Answer 1 · 2015-01-09T21:06:59.363

As pointed out, this unfortunately, this cannot be achieved using the aggregation framework of MongoDB that exists currently, and rightly as you mentioned, map-reduce would be a overkill.

But there are alternate approaches to it:

Approach A:

Maintain a variable denoting the level of hierarchy based on the w field, or the field by which you want to sort the result set. Once you add the variable to each document during insertion.
Your documents would contain a new field called level which holds an array of a single value. We will discuss, why this needs to be an array and not a simple field.

Insert scripts:

db.collection.insert([
  { parent: 'a', d1: '1', d2: '2', d3: '3', w: 10,level:[6] },
  { parent: 'a', d1: '1', d2: '2', d3: '3', w: 20,level:[5] },
  { parent: 'a', d1: '1', d2: '2', d3: '3', w: 30,level:[4] },
  { parent: 'a', d1: '1', d2: '2', d3: '3', w: 40,level:[3] },
  { parent: 'a', d1: '1', d2: '2', d3: '3', w: 50,level:[2] },
  { parent: 'a', d1: '1', d2: '2', d3: '3', w: 60,level:[1] },
  { parent: 'b', d1: '1', d2: '2', d3: '3', w: 10,level:[4] },
  { parent: 'b', d1: '1', d2: '2', d3: '3', w: 13,level:[3] },
  { parent: 'b', d1: '1', d2: '2', d3: '3', w: 14,level:[2] },
  { parent: 'b', d1: '1', d2: '2', d3: '3', w: 15,level:[1] },
  { parent: 'c', d1: '1', d2: '2', d3: '3', w: 10,level:[4] },
  { parent: 'c', d1: '1', d2: '2', d3: '3', w: 100,level:[3] },
  { parent: 'c', d1: '1', d2: '2', d3: '3', w: 200,level:[2] },
  { parent: 'c', d1: '1', d2: '2', d3: '3', w: 300,level:[1] }
])

Let's assume you want to get the top 3 results based on the sort order of the w field for each parent. The you could easily aggregate as below:

var levels = [1,2,3];  // indicating the records in the range that we need to pick up,
                       // from each parent.

Match all the parents that are a or b.
Sort the records by the w field.
Group by the parent. Once you group, all the documents for a parent become sub documents of the grouped record, hence allowing you to apply the $redact stage.
Now apply the $redact stage to redact those sub documents, whose level is not a sub set of the levels that we seek. We kept level as an array because it makes it easier to apply the $setIsSubset operator on it. Else we would have required $in, which is not supported inside a $cond expression.

Code:

Model.aggregate(
{$match:{"parent":{$in:["a","b"]}}},
{$sort:{"w":-1}},
{$group:{"_id":"$parent",
         "rec":{$push:"$$ROOT"}}},
{$redact:{$cond:[{$setIsSubset:[{$ifNull:["$levels",[1]]},
                               inp]},
                 "$$DESCEND","$$PRUNE"]}},
,function(err,resp){
 // handle response
})

The obtained output is perfect, as we wanted:(Showing just b group, for keeping it shorter)

{
        "_id" : "b",
        "rec" : [
                {
                        "_id" : ObjectId("54b030a3e4eae97f395e5e89"),
                        "parent" : "b",
                        "d1" : "1",
                        "d2" : "2",
                        "d3" : "3",
                        "w" : 15,
                        "level" : [
                                1
                        ]
                },
                {
                        "_id" : ObjectId("54b030a3e4eae97f395e5e88"),
                        "parent" : "b",
                        "d1" : "1",
                        "d2" : "2",
                        "d3" : "3",
                        "w" : 14,
                        "level" : [
                                2
                        ]
                },
                {
                        "_id" : ObjectId("54b030a3e4eae97f395e5e87"),
                        "parent" : "b",
                        "d1" : "1",
                        "d2" : "2",
                        "d3" : "3",
                        "w" : 13,
                        "level" : [
                                3
                        ]
                }
        ]
}

Approach B:

The redaction of the sub documents is done on the client side:

var result = db.collection.aggregate([
{$match:{"parent":{$in:["a","b"]}}},
{$sort:{"w":-1}},
{$group:{"_id":"$parent","rec":{$push:"$$ROOT"}}}
]).map(function(doc){
    doc.rec.splice(0,3);
    return doc;
})

Which is fairly slower, since all the records for each parent would be returned by MongoDB. The choice is yours, depending upon what suits your application.

It doesnt matter. 'level' is just a field which helps identify the top records. When you insert a document you need to make sure it is set according to the field you would want to sort on. — BatScream, Jan 09 '15 at 20:33

score 0 · Accepted Answer · edited May 23 '17 at 11:57

0

After reading this answer to a similar question I decided to go down that path, and I wrote a module that builds the aggregate query for you with a certain degree of flexibility.

Example code based on my initial question:

var _ = require('lodash');
var limited = require('limited');
var D = require('./models/D');

function getLastDsByParent (ids, done) {
  var options = {
    model: D,
    field: 'parent',
    query: { parent : { $in: ids } },
    limit: 3,
    sort: { w: -1 }
  };
  limited(options, find);

  function find (err, result) {
    if (err) {
      done(err); return;
    }

    D
      .find({ _id: { $in: _.flatten(result, 'documents') } })
      .lean()
      .exec(done);
  }
}

edited May 23 '17 at 11:57

Community

1
1

answered Jan 09 '15 at 20:35

bevacqua

47,502
56
171
285

This approach doesn't slice off the top `n` records for a parent, on the database server, and the linked answer has too many stages and involves a lot of overhead which could result in very low performance if your data set is large. If you would want top `n` records, you would end up doing `n` group and `n` unwind stages, leave alone the number of in between project stages. – BatScream Jan 09 '15 at 20:57
It does slice the top `n` records, it's basically the linked answer made into a module. I am concerned about the overhead, for sure. I think I should measure that – bevacqua Jan 09 '15 at 20:57
If I went with your first approach I'd have to modify every single document for `parent` on every insert. – bevacqua Jan 09 '15 at 21:03
Yes i agree, you would require a bulk update. So you need to choose the best solution considering your data set and application code(which would require some amount of rework) if you go by the first approach. – BatScream Jan 09 '15 at 21:05

Mongoose limit by association

2 Answers2

Approach A:

Approach B: