1

Say I have an entry for every day in the year (or possibly every hour, every minute, ...). What I'd like to do is query all rows that are in between the range of two dates and only return one entry for every interval n (e.g. one entry each week or one entry every second day, ...)

For a more specific example, my database has entries like this:

{ _id: ..., date: ISODate("2014-07-T01:00:00Z"), values: ... }
{ _id: ..., date: ISODate("2014-07-02T12:00:00Z"), values: ... }
...
{ _id: ..., date: ISODate("2015-03-17T12:00:00Z"), values: ... }
{ _id: ..., date: ISODate("2015-03-18T12:00:00Z"), values: ... }

I want every result between 2014-12-05 and 2015-02-05 but only one every 3 days. The result set should look like this:

{ _id: ..., date: ISODate("2014-12-05T12:00:00Z"), values: ... }
{ _id: ..., date: ISODate("2014-12-08T12:00:00Z"), values: ... }
{ _id: ..., date: ISODate("2014-12-11T12:00:00Z"), values: ... }
{ _id: ..., date: ISODate("2014-12-14T12:00:00Z"), values: ... }
...

Can this be done somehow?

Ke Vin
  • 2,004
  • 1
  • 18
  • 28

2 Answers2

3

Using the aggregation framework (and an awfully complicated query), you can achieve your goal. Something along the lines of the following:

db.coll.aggregate([
    {$match: {
        date: {
            $gte: ISODate("2014-12-08T12:00:00.000Z"),
            $lt: ISODate("2014-12-12T00:00:00.000Z")
        }
    }},
    {$project:
        { date:1,
          value: 1,
          grp: { $let: 
                 {
                   vars: { delta:{$subtract:["$date", ISODate("2014-12-08T12:00:00.000Z")]}},
                   in: {$subtract:["$$delta", {$mod:["$$delta",3*24*3600*1000]}]}
                 }
               }
        }
    },
    {$sort: { date: 1 }},
    {$group: {_id:"$grp", date: {$first:"$date"}, value: {$first: "$value"}}}
])
  • the $match step will keep only rows in the desired range;
  • the project step will keep date and value, and will compute a "group number" based on the date. delta is the time difference in ms between the given date and some arbitrary application dependent origin. As MongoDB does not have the integer division operator, I use a substitute: delta-mod(delta, 3*24*3600*1000). This will change every 3 days (3 days × 24 hours × 3600 sec × 1000 ms);
  • the $sort step is maybe not required depending your use case. I use it in order to ensure a deterministic result when keeping the first date and value of each group in the next step;
  • finally (!) $group will group documents by the grp value calculated before, keeping only the first date and value of each group.
Sylvain Leroux
  • 50,096
  • 7
  • 103
  • 125
  • Thank you! This is working perfectly fine :) I also thought about grabbing all rows and just filter them manually in my program. Some tables only have one entry per day and the interval is only something between 2-7 days. Do you know how the performance for your query will likely be with big data sets? – Ke Vin May 19 '15 at 20:41
  • @KeVin It is always hard to talk _a priori_ about performance. But here, I would say the whole thing is dependent on the `$match` step. With an index on the required field and if the number of documents going through the other stages is not excessively high, this should perform decently well. That being say, only a proper benchmarking on your actual dataset will give you a definitive answer. – Sylvain Leroux May 19 '15 at 21:11
0

You can query for ranges using the following syntax:

db.collection.find( { field: { $gt: value1, $lt: value2 } } );

In your case, field would be the date field and this question may help you format the values:

return query based on date

Edit: I did not see the requirement for retrieving every nth document. In that case, I'm not sure MongoDB has built in support for that. You may have to manipulate the returned array yourself. In this case, once you get the range you can filter by index. Here's some boilerplate (I couldn't figure out an efficient use of Array.prototype.filter since that function removes the need for indices -- the opposite of what you want.):

var result =[]
for (var i = 0; i < inputArray.length ; i+=3) {     
    result.push(numList[i]);        
}
return result;
Community
  • 1
  • 1
NicholasFolk
  • 1,021
  • 1
  • 8
  • 14
  • The key problem here is to group by 3 days. – Sylvain Leroux May 08 '15 at 15:45
  • Ah yes, my mistake. I got ahead of myself and begun to answer before I had gathered all requirements of the question (valuable lesson to be learned in this industry). I will amend my question with a possible solution – NicholasFolk May 08 '15 at 16:22