15

I can reference the values of individual values of attributes in MongoDB aggregation pipeline using '$' operator. But, how do I access (reference) the whole document ?


UPDATE: An example provided to explain scenario.

Here's an example of what I'm trying to do. I have a collection of tweets. And every tweet has a member 'clusters', which is an indication of to what cluster a particular tweet belongs to.

{
    "_id" : "5803519429097792069",
    "text" : "The following vehicles/owners have been prosecuted by issuing notice on the basis of photographs on dated... http://t.co/iic1Nn85W5",
    "oldestts" : "2013-02-28 16:11:32.0",
    "firstTweetTime" : "4 hours ",
    "id" : "307161122191065089",
    "isLoc" : true,
    "powertweet" : true,
    "city" : "new+delhi",
    "latestts" : "2013-02-28 16:35:05.0",
    "no" : 0,
    "ts" : 1362081807.9693,
    "clusters" : [
        {
            "participationCoeff" : 1,
            "clusterID" : "5803519429097792069"
        }
    ],
    "username" : "dtptraffic",
    "verbSet" : [
        "date",
        "follow",
        "prosecute",
        "have",
        "be"
    ],
    "timestamp" : "4 hours ",
    "entitySet" : [ ],
    "subCats" : {
        "Generic" : [ ]
    },
    "lang" : "en",
    "fns" : 18.35967,
    "url" : "url|109|131|http://fb.me/2CeaI7Vtr",
    "cat" : [
        "Generic"
    ],
    "order" : 7
} 

Since, there are some couple of hundred thousands tweets in my collection, I want to group all tweets by 'clusters.clusterID'. Basically, I would want to write a query like following:

db.tweets.aggregate (
{ $group : { _id : '$clusters.clusterID', 'members' : {$addToSet : <????> } } }
)

I want to access the presently processing document and reference it where I have put in the above query. Does anyone knows how to do this?

Neil Lunn
  • 148,042
  • 36
  • 346
  • 317
VaidAbhishek
  • 5,895
  • 7
  • 43
  • 59
  • 1
    do you have an example of what you are trying to do? – RickyA Feb 28 '13 at 19:15
  • 1
    in a nutshell - no, there is no way to do this (there would be if you knew all the key names, but that's unlikely to be helpful). – Asya Kamsky Feb 28 '13 at 20:03
  • you could do this in agg framework if you're willing to settle for a fixed set of fields of the original document. – Asya Kamsky Mar 01 '13 at 00:49
  • @AsyaKamsky Yes, you are right. But that is clunky, since i have tens of fields in the my document. And what if some fields are absent from some documents. There must be a mechanism to access the the whole document! – VaidAbhishek Mar 01 '13 at 11:17

3 Answers3

25

Use the $$ROOT variable:

References the root document, i.e. the top-level document, currently being processed in the aggregation pipeline stage.

Xavier Guihot
  • 54,987
  • 21
  • 291
  • 190
Volox
  • 1,068
  • 1
  • 12
  • 23
  • 1
    this question was asked when MongoDB 2.2 was current - $$ROOT was added in version 2.6 (early 2014) – Asya Kamsky Dec 26 '15 at 22:51
  • 1
    maybe you could respond [this question of mine](http://stackoverflow.com/questions/39288087/mongodb-collection-with-different-language-texts-select-localized-texts). The problem is that I would like to obtain the document itself, not as a subdocument, kind of `{ $group: $$ROOT }` which is not possible, and for the moment it can just be as a subdocument: `{ $group: { _id: '$$ROOT' } }` – Miquel Sep 02 '16 at 10:07
  • How to make this work when using a projection first? – Dane411 Jun 30 '17 at 21:07
2

There is currently no mechanism to access the full document in aggregation framework, if you only needed a subset of fields, you could do:

db.tweets.aggregate([ {$group: { _id: '$clusters.clusterID',
                                  members: {$addToSet :  
                                       { user: "$user",
                                         text: "$text", // etc for subset 
                                                        // of fields you want
                                       }
                                  } 
                               } 
                       } ] )

Don't forget with a few hundred thousand tweets, aggregating the full document will run you into the 16MB limit for returned aggregation framework result document.

You can do this via MapReduce like this:

var m = function() {
  emit(this.clusters.clustersID, {members:[this]});
}

var r = function(k,v) {
  res = {members: [ ] };
  v.forEach( function (val) {
     res.members = val.members.concat(res.members);
  } );
  return res;
}

db.tweets.mapReduce(m, r, {out:"output"});
Asya Kamsky
  • 41,784
  • 5
  • 109
  • 133
  • I had the same issue and BatScream offered the following solution. http://stackoverflow.com/questions/34404834/how-to-group-and-select-document-corresponding-to-max-within-each-group-in-mongo?noredirect=1#comment56552218_34404834. He suggested accessing full document via $$ROOT – user1700890 Dec 24 '15 at 15:06
  • $$ROOT was introduced in 2.6 and was not available at the time of this question/answer. https://jira.mongodb.org/browse/SERVER-9840 – Asya Kamsky Dec 26 '15 at 22:50
-2

I think MapReduce more useful for this task.

As written in the comments by Asya Kamsky, my example is incorrect for mongodb, please use official docs for mongoDB.

Silver_Clash
  • 397
  • 1
  • 7
  • you're right that map/reduce can do this, but what you gave here will not work. Your map is slightly wrong, and your reduce function seems to be missing entirely. – Asya Kamsky Mar 01 '13 at 00:47
  • that's not how map/reduce works.Your reduce function has to return the same format that your map function emits, and it also may be called more than once. Your test may have given the "right" looking answer for some small test set, but it won't work correctly on real data. – Asya Kamsky Mar 01 '13 at 08:30
  • 1
    see the docs page for mapReduce. http://docs.mongodb.org/manual/reference/method/db.collection.mapReduce/#requirements-for-the-reduce-function lists both those facts (plus the fact that reduce won't be called at all for mapped keys that only occur once) – Asya Kamsky Mar 01 '13 at 13:48