1

I'm very new to Mongo, and am just getting my head around the core concepts...I'm implementing a schema for time series data, and am planning to try the pattern suggested here: MongoDB as a Time Series Database, which has also appeared in some Mongo presentations.

I understand the schema, but am having difficulty working out how one might query it for a range of dates. More specifically, can someone show an example of how to query the schema at the link above to retrieve a 1-minute series that spans multiple hours/days? Ideally, without the need for post processing outside of Mongo.

The Mongo docs and aggregation pipeline appear mostly concerned with processing arrays rather than nested subdocuments...TIA.

EDIT: To add more clarity to the specific problem I'm trying to solve...

Let's say I'm storing data in 1 minute intervals, with one parent document per day, using the following schema (snipped from the post linked to above):

{
  timestamp_hour: ISODate("2013-10-10T23:00:00.000Z"),
  type: “spot_EURUSD”,
  values: {
    0: { 0: 1.2343, 1: 1.2343, …, 59: 1.2343},
    1: { 0: 1.2343, 1: 1.2343, …, 59: 1.2343},
    …,
    22: { 0: 1.2343, 1: 1.2343, …, 59: 1.2343},
    23: { 0: 1.2343, 1: 1.2343, …, 59: 1.2343}
  }
}

What would be the most effective/efficient way to satisfy queries of the form: "Give me a chronological list of values, 1 per minute, starting on 2013-09-25 at 1:37pm, and ending on 2013-10-15 at 2:56pm"?

stephenrs
  • 73
  • 9

1 Answers1

1

There's a comment from @jtromans on the link you gave that should point you in the right direction:

...you should continue 'binning' your data accordingly down to the finest resolution necessary to satisfy the criteria

So assuming you have a schema like the following:

{
    timestamp_hour: ISODate(...),
    values: {
        0: {},
        1: {},
        ...
        59: {}
    }
}

Then you have a subdocument per-minute, which will allow you to satisfy your query fairly simply (to find every time in the 4th minute, for example:

collection.find({}, {"values.3": 1})

This just a projection to filter only the minute values you are interested in. Since it's otherwise a full table scan you'd probably want to include a date range on the timestamp_hour field to limit the search. You can use an aggregation if you like to project the values to better suit the format you expect, for example:

collection.aggregate([
    {$project: {val: "$values.1"}}
])

If you need to be able to filter on hours, seconds, or other portions of the time, then you will need either bins or keys for those in your schema, e.g. binning both seconds and minutes:

{
    timestamp_hour: ISODate(...),
    minutes: {
        0: {
            seconds: {
                0: ...
            }
        },
        ...
    }
}

e.g. adding other values as a key so they can be indexed and filtered:

{
    timestamp_hour: ISODate(...),
    hour_of_day: 0,
    day_of_month: 1
    minutes: {
        ...
    }
}

Note that I've used a document-per-hour approach here, you'll have to decide based on your data and requirements if that suits you or if you want a document per-minute, per-day, etc.

EDIT: Here is an example that better matches the edited question:

db.ts.aggregate([
    {
        $match: {
            timestamp_hour: {$lte: {ISODate("2013-09-25")}, $gte: {ISODate("2013-10-15")}}
        }
    },
    {
        $project: {
            hours: {$objectToArray: "$values"}
        }
    },
    {
        $unwind: "$hours"
    },
    {
        $project: {
            hour_index: "$hours.k",
            minutes: {$objectToArray: "$hours.v"}
        }
    },
    {
        $unwind: "$minutes"
    },
    {
        $project: {
            reconstructed_date: {$dateFromParts: {
                year: {$year: "$timestamp_hour"},
                month: {$month: "$timestamp_hour"},
                day: {$day: "$timestamp_hour"},
                hour: "$hour_index",
                minute: "$minutes.k",
            }}
            value: "$minutes.v"
        }
    },
    {
        $match: {
            reconstructed_date: {$lte: {ISODate("2013-09-25T13:37:00.000Z")}, $gte: {ISODate("2013-10-15T14:56:00.000Z")}}
        }
    }
])

I'm not attempted to get timezones right in this one, that's up to you!

Logan Pickup
  • 2,294
  • 2
  • 22
  • 29
  • Thanks for your answer, but I think what I'm looking for may be simpler than what you've described. I'd like to just do a vanilla date range query, not filter for specific time slots. I've edited my question for clarity. Any thoughts? – stephenrs Dec 18 '17 at 05:01
  • Ah, I see. You'll need to use `$objectToArray` in a `$project` stage so you can then use `$unwind` in a later stage (unwind only works on arrays; you might want to switch from objects to arrays for this specific reason, as your aggregate will get messy); to get all minutes, instead of a specific one, and then select something in that minute (the first item, if the value is an array or subdocument instead of a value) you'll need to do this _twice_. – Logan Pickup Dec 18 '17 at 07:01
  • The time range query will get complex too - you'll need one initial `$match` stage on the `timestamp_hour` field (to get the document set to a reasonable size, and you'll need to be careful with your bounds), and you'll need a `$project` stage to create new fields with the hour/minute/second values so you have a final `$match` stage to filter out unwanted values in the first/last days. Overall, getting quite complex and I'd want to run tests on production-sized data sets to make sure the query performance is going to be ok. – Logan Pickup Dec 18 '17 at 07:04
  • See updated answer for an example - it's far from elegant at this point! – Logan Pickup Dec 18 '17 at 07:18
  • This is really great, Logan. Your example is highly instructional, not only for solving this particular problem, but also for getting a solid sense of how to assemble complex pipelines in general. Also, now I can see clearly how write-optimized that example schema is, and I tend to agree that arrays will likely be a better fit for my use case. Thanks!! – stephenrs Dec 19 '17 at 00:07