6

How can I aggregate my MongoDB results by ObjectId date. Example:

Default cursor results:

cursor = [
    {'_id': ObjectId('5220b974a61ad0000746c0d0'),'content': 'Foo'},
    {'_id': ObjectId('521f541d4ce02a000752763a'),'content': 'Bar'},
    {'_id': ObjectId('521ef350d24a9b00077090a5'),'content': 'Baz'},
]

Projected results:

projected_cursor = [
    {'2013-09-08':
        {'_id': ObjectId('5220b974a61ad0000746c0d0'),'content': 'Foo'},
        {'_id': ObjectId('521f541d4ce02a000752763a'),'content': 'Bar'}
    },
    {'2013-09-07':
        {'_id': ObjectId('521ef350d24a9b00077090a5'),'content': 'Baz'}
    }
]

This is what I'm currently using in PyMongo to achieve these results, but it's messy and I'd like to see how I can do it using MongoDB's aggregation framework (or even MapReduce):

cursor = db.find({}, limit=10).sort("_id", pymongo.DESCENDING)
messages = [x for x in cursor]
this_date = lambda x: x['_id'].generation_time.date()
dates = set([this_date(message) for message in messages])
dates_dict = {date: [m for m in messages if this_date(m) == date] for date in dates}

And yes, I know that the easiest way would be to simply add a new date field to each record then aggregate by that, but that's not what I want to do right now.

Thanks!

Dan Gayle
  • 2,277
  • 1
  • 24
  • 38

3 Answers3

10

Update: There is a built in way to do this now, see https://stackoverflow.com/a/51766657/295687

There is no way to accomplish what you're asking with mongodb's aggregation framework, because there is no aggregation operator that can turn ObjectId's into something date-like (there is a JIRA ticket, though). You should be able to accomplish what you want using map-reduce, however:

// map function
function domap() {
    // turn ObjectId --> ISODate
    var date = this._id.getTimestamp();
    // format the date however you want
    var year = date.getFullYear();
    var month = date.getMonth();
    var day = date.getDate();

    // yields date string as key, entire document as value
    emit(year+"-"+month+"-"+day, this);
}

// reduce function
function doreduce(datestring, docs) {
    return {"date":datestring, "docs":docs};
}
mousetail
  • 7,009
  • 4
  • 25
  • 45
llovett
  • 1,449
  • 1
  • 12
  • 21
  • Thanks. I'll have to test the performance of this vs. doing it in Python like in my example. I never know if I should be pushing stuff like this to the DB or not. – Dan Gayle Sep 19 '13 at 19:58
  • Also, thanks for the ticket ref, I posted this thread to there also. – Dan Gayle Sep 19 '13 at 20:05
  • This answer is wrong. There is a simple native method in the aggregation framework to convert an ObjectID to a date. See this answer: https://stackoverflow.com/a/51766657/295687 – Nick Apr 23 '21 at 07:01
5

The Jira Ticket pointed out by llovett has been solved, so now you can use date operators like $isoWeek and $year to extract this information from an ObjectId.

Your aggregation would look something like this:

{
    "$project":
        {

            "_id": {
                "$dateFromParts" : {
                    "year": { "$year": "$_id"},
                    "month": { "$month": "$_id"},
                    "day": { "$dayOfMonth": "$_id"}
                }
            }
        }
}
Fabricio Buzeto
  • 1,243
  • 1
  • 18
  • 29
  • 1
    This is the answer! And to convert the ObjectID to a full timestamp, you can use: `{ $project: _id: { $toDate: "$_id" } }` – Nick Apr 23 '21 at 06:58
2

So this doesn't answer my question directly, but I did find a better way to replace all that lambda nonsense above using Python's setdefault:

d = {}
for message in messages:
    key = message['_id'].generation_time.date()
    d.setdefault(key,[]).append(message)

Thanks to @raymondh for the hint in is PyCon talk:

Transforming Code into Beautiful, Idiomatic Python

Dan Gayle
  • 2,277
  • 1
  • 24
  • 38