1

So I have a set of data that have timestamps associated with it. I want mongo to aggregate the ones that have duplicates within a 3 min timestamp. I'll show you an example of what I mean:

Original Data:

[{"fruit" : "apple", "timestamp": "2014-07-17T06:45:18Z"},
 {"fruit" : "apple", "timestamp": "2014-07-17T06:47:18Z"},
 {"fruit" : "apple", "timestamp": "2014-07-17T06:55:18Z"}]

After querying, it would be:

[{"fruit" : "apple", "timestamp": "2014-07-17T06:45:18Z"},
 {"fruit" : "apple", "timestamp": "2014-07-17T06:55:18Z"}]

Because the second entry was within the 3 min bubble created by the first entry. I've gotten the code so that it aggregates and removed dupes that have the same fruit but now I only want to combine the ones that are within the timestamp bubble.

josneville
  • 449
  • 2
  • 5
  • 15

1 Answers1

1

We should be able to do this! First lets split up an hour in 3 minute 'bubbles':

[0, 3, 6, 9, 12, 15, 18, 21, 24, 27, 30, 33, 36, 39, 42, 45, 48, 51, 54, 57]

Now to group these documents we need to modify the timestamp a little. As far I as know this isn't currently possible with the aggregation framework so instead I will use the group() method.

In order to group fruits within the same time period we need to set the timestamp to the nearest minute 'bubble'. We can do this with timestamp.minutes -= (timestamp.minutes % 3).

Here is the resulting query:

db.collection.group({
    keyf: function (doc) {
        var timestamp = new ISODate(doc.timestamp);

        // seconds must be equal across a 'bubble'
        timestamp.setUTCSeconds(0);

        // round down to the nearest 3 minute 'bubble'
        var remainder = timestamp.getUTCMinutes() % 3;
        var bubbleMinute = timestamp.getUTCMinutes() - remainder;
        timestamp.setUTCMinutes(bubbleMinute);

        return { fruit: doc.fruit, 'timestamp': timestamp };
    },
    reduce: function (curr, result) {
        result.sum += 1;
    },
    initial: {
        sum : 0
    }
});

Example results:

[
    {
        "fruit" : "apple",
        "timestamp" : ISODate("2014-07-17T06:45:00Z"),
        "sum" : 2
    },
    {
        "fruit" : "apple",
        "timestamp" : ISODate("2014-07-17T06:54:00Z"),
        "sum" : 1
    },
    {
        "fruit" : "banana",
        "timestamp" : ISODate("2014-07-17T09:03:00Z"),
        "sum" : 1
    },
    {
        "fruit" : "orange",
        "timestamp" : ISODate("2014-07-17T14:24:00Z"),
        "sum" : 2
    }
]

To make this easier you could precompute the 'bubble' timestamp and insert it into the document as a separate field. The documents you create would look something like this:

[
    {"fruit" : "apple", "timestamp": "2014-07-17T06:45:18Z", "bubble": "2014-07-17T06:45:00Z"},
    {"fruit" : "apple", "timestamp": "2014-07-17T06:47:18Z", "bubble": "2014-07-17T06:45:00Z"},
    {"fruit" : "apple", "timestamp": "2014-07-17T06:55:18Z", "bubble": "2014-07-17T06:54:00Z"}
]

Of course this takes up more storage. However, with this document structure you can use the aggregate function[0].

db.collection.aggregate(
  [
    { $group: { _id: { fruit: "$fruit", bubble: "$bubble"} , sum: { $sum: 1 } } },
  ]
)

Hope that helps!

[0] MongoDB aggregation comparison: group(), $group and MapReduce

Community
  • 1
  • 1
Justin Case
  • 1,503
  • 11
  • 20