2

I have the documents below saved in the mongodb collection. They are sorted by the ascending order. I want to only get one document within the specified time interval. ( I'm using node.js with the node-mongodb driver.) How should I implement it?

{"created_at":"2013-03-19T07:14:05Z"}
{"created_at":"2013-03-19T07:35:40Z"}
{"created_at":"2013-03-19T07:59:52Z"}
{"created_at":"2013-03-19T08:01:32Z"}
{"created_at":"2013-03-19T08:02:40Z"}
{"created_at":"2013-03-19T08:02:56Z"}
{"created_at":"2013-03-19T08:06:24Z"}
{"created_at":"2013-03-19T08:07:08Z"}
{"created_at":"2013-03-19T08:23:27Z"}
{"created_at":"2013-03-19T08:27:44Z"}
{"created_at":"2013-03-19T08:27:58Z"}
{"created_at":"2013-03-19T08:28:04Z"}
{"created_at":"2013-03-19T08:28:08Z"}
{"created_at":"2013-03-19T08:28:23Z"}

For example ,if the time interval is 1 minute, the expected result is as below.

{"created_at":"2013-03-19T07:14:05Z"}
{"created_at":"2013-03-19T07:35:40Z"}
{"created_at":"2013-03-19T07:59:52Z"}
{"created_at":"2013-03-19T08:01:32Z"}
{"created_at":"2013-03-19T08:02:40Z"}
{"created_at":"2013-03-19T08:06:24Z"}
{"created_at":"2013-03-19T08:07:08Z"}
{"created_at":"2013-03-19T08:23:27Z"}
{"created_at":"2013-03-19T08:27:44Z"}
{"created_at":"2013-03-19T08:28:04Z"}

The documents below should not be returned.

{"created_at":"2013-03-19T08:02:56Z"}
{"created_at":"2013-03-19T08:27:58Z"}
{"created_at":"2013-03-19T08:28:08Z"}
{"created_at":"2013-03-19T08:28:23Z"}

Thanks,

Jeffrey

Jeffrey
  • 4,436
  • 9
  • 38
  • 54

1 Answers1

4

Map/Reduce is what you are looking for.

Think about your collection like that: you have documents where created_at becomes ID. Or I should say that part of created_at up to a minute. So for example this function will be used to determine the ID:

var GenerateID = function(date) {
    return date.getFullYear() + "/" +
           date.getMonth() + "/" +
           date.getDate() + "." +
           date.getHours() + ":" +
           date.getMinutes();
};

So this function converts date object to a string including year, month, day, hour and minute. We don't care about seconds, because you want only one object per minute.

Now you have to define map and reduce functions. For example map may look like that:

var map = function() {
    var key = GenerateID(this.created_at);
    emit(key, this);
};

and reduce:

var reduce = function(key, values) {
    if (values.length) {
        return values[0];
    }
};

Here we just return the first value we have ( combined with sorting will give you what you want ). Note that this is per key, so we are good.

Now you have to fire this job on Mongo side. Depending on your driver it may look like this:

db.collection.mapReduce(
    map,
    reduce,
    {
        out: { inline: 1 },
        query: // your range query
        sort: // by created_at
        scope: { GenerateID: GenerateID },
    }
)

Here's the official MongoDB's map/reduce overview:

http://docs.mongodb.org/manual/applications/map-reduce/

freakish
  • 54,167
  • 9
  • 132
  • 169