0

I've tried several ways of creating an aggregation pipeline which returns just the matching entries from a document's embedded array and not found any practical way to do this.

Is there some MongoDB feature which would avoid my very clumsy and error-prone approach?

A document in the 'workshop' collection looks like this...

{
  "_id": ObjectId("57064a294a54b66c1f961aca"),
  "type": "normal",
  "version": "v1.4.5",
  "invitations": [],
  "groups": [
    {
      "_id": ObjectId("57064a294a54b66c1f961acb"),
      "role": "facilitator"
    },
    {
      "_id": ObjectId("57064a294a54b66c1f961acc"),
      "role": "contributor"
    },
    {
      "_id": ObjectId("57064a294a54b66c1f961acd"),
      "role": "broadcaster"
    },
    {
      "_id": ObjectId("57064a294a54b66c1f961acf"),
      "role": "facilitator"
    }
  ]
}

Each entry in the groups array provides a unique ID so that a group member is assigned the given role in the workshop when they hit a URL with that salted ID.

Given a _id matching an entry in a groups array like ObjectId("57064a294a54b66c1f961acb"), I need to return a single record like this from the aggregation pipeline - basically returning the matching entry from the embedded groups array only.

    {
      "_id": ObjectId("57064a294a54b66c1f961acb"),
      "role": "facilitator",
      "workshopId": ObjectId("57064a294a54b66c1f961aca")
    },

In this example, the workshopId has been added as an extra field to identify the parent document, but the rest should be ALL the fields from the original group entry having the matching _id.

The approach I have adopted can just about achieve this but has lots of problems and is probably inefficient (with repetition of the filter clause).

return workshopCollection.aggregate([
    {$match:{groups:{$elemMatch:{_id:groupId}}}},
    {$unwind:"$groups"},
    {$match:{"groups._id":groupId}},
    {$project:{
        _id:"$groups._id",
        role:"$groups.role",
        workshopId:"$_id",
    }},
]).toArray();

Worse, since it explicitly includes named fields from the entry, it will omit any future fields which are added to the records. I also can't generalise this lookup operation to the case of 'invitations' or other embedded named arrays unless I can know what the array entries' fields are in advance.

I have wondered if using the $ or $elemMatch operators within a $project stage of the pipeline is the right approach, but so far they have either been either ignored or triggered operator validity errors when running the pipeline.

QUESTION

Is there another aggregation operator or alternative approach which would help me with this fairly mainstream problem - to return only the matching entries from a document's array?

Neil Lunn
  • 148,042
  • 36
  • 346
  • 317
cefn
  • 2,895
  • 19
  • 28
  • No. Your question and usage is a bit off as well since you only ever need `$elemMatch` when you have *"more than one condition to match one or more properties"* of an array element. For everything else ( just like you are querying with one condition on a single property here ) you use "dot notation" just like you did in the second `$match` statement. – Neil Lunn Apr 07 '16 at 12:51
  • Your aggregation query looks fine to me. And you can't create a generic aggregation query that works with all types of embedded object arrays without knowing the fields. That would make it more complex and error prone. – SiddAjmera Apr 07 '16 at 12:52
  • Bottom line is that if you want array members to just appear as if they were a top level document you **must** use `$unwind` and then `$project` the final document. Alternately you can just accept and live with processing the basic documents you receive in a cursor result and tranforming output to where you send it on to. Simple case of iterating each array member in each document returned. – Neil Lunn Apr 07 '16 at 12:54
  • Thanks for the helpful comments. Been investigating one of the problems (duplicate filtering) a little further and I think it comes down to the following issue. The findOne operation can return a result like... ` return workshopCollection.findOne( {"groups":{$elemMatch:{_id:groupId}}}, {"groups.$":true} ); ` where one argument is a match and the other is a projection but these same specifications are not possible in an aggregation pipeline. – cefn Apr 07 '16 at 13:26
  • Thanks also for clarifying that serving up array entries as top-level documents is impossible, as this is one of the things I assumed must be possible, but that I was somehow missing how to do it server-side. – cefn Apr 07 '16 at 13:31
  • I have added an answer which efficiently and generally addresses my requirement (a single match in a collection). It could be generalised for multiple matches by using find() instead of findOne(), but this will only return the first matching array entry per document so still does not handle the general case of multiple matches. For the general case it seems impossible to avoid a duplicate filtering clause, and for all cases the remapping of arbitrary array entries as top-level documents needs to be done in application code, and cannot be handed off to the server. – cefn Apr 07 '16 at 14:14

1 Answers1

0

The implementation below can handle arbitrary queries, serves results as a 'top-level document' and avoids duplicate filtering in the pipeline.

function retrieveArrayEntry(collection, arrayName, itemMatch){
    var match = {};
    match[arrayName]={$elemMatch:itemMatch};
    var project = {};
    project[arrayName+".$"] = true;
    return collection.findOne(
        match,
        project
    ).then(function(doc){
        if(doc !== null){
            var result = doc[arrayName][0];
            result._docId = doc._id;
            return result;
        }
        else{
            return null;
        }
    });
}

It can be invoked like so...

retrieveArrayEntry(workshopCollection, "groups", {_id:ObjectId("57064a294a54b66c1f961acb")})

However, it relies on the collection findOne(...) method instead of aggregate(...) so will be limited to serving the first matching array entry from the first matching document. Projections referencing an array match clause are apparently not possible through aggregate(...) in the same way they are through findXXX() methods.

A still more general (but confusing and inefficient) implementation allows retrieval of multiple matching documents and subdocuments. It works around the difficulty MongoDb has with syntax consistency of Document and Subdocument matching through the unpackMatch method, so that an incorrect 'equality' criterion e.g. ...

{greetings:{_id:ObjectId("437908743")}}

...gets transferred into the required syntax for a 'match' criterion (as discussed at Within a mongodb $match, how to test for field MATCHING , rather than field EQUALLING )...

{"greetings._id":ObjectId("437908743")}

Leading to the following implementation...

function unpackMatch(pathPrefix, match){
    var unpacked = {};
    Object.keys(match).map(function(key){
        unpacked[pathPrefix + "." + key] = match[key];
    })
    return unpacked;
}

function retrieveArrayEntries(collection, arrayName, itemMatch){

    var matchDocs = {},
        projectItems = {},
        unwindItems = {},
        matchUnwoundByMap = {};

    matchDocs.$match={};
    matchDocs.$match[arrayName]={$elemMatch:itemMatch};

    projectItems.$project = {};
    projectItems.$project[arrayName]=true;

    unwindItems.$unwind = "$" + arrayName;

    matchUnwoundByMap.$match = unpackMatch(arrayName, itemMatch);

    return collection.aggregate([matchDocs, projectItems, unwindItems, matchUnwoundByMap]).toArray().then(function(docs){
        return docs.map(function(doc){
            var result = doc[arrayName];
            result._docId = doc._id;
            return result;
        });
    });
}
Community
  • 1
  • 1
cefn
  • 2,895
  • 19
  • 28