1

I have a time series that grows and is (potentially) revised through time:

on "2013-01-01": First version of the data

"2013-01-01" 10

on "2013-01-02": Data of the 1st of Jan is revised from 10 to 11

"2013-01-01" 11

on "2013-02-01": First version of the data of the 1st of Feb

"2013-01-01" 11
"2013-02-01" 20

on "2013-02-02": Data of the 1st of Feb is revised from 20 to 21

"2013-01-01" 11
"2013-02-01" 21

most frequent queries:

query1: get the most recent version of all dates

"2013-01-01" 11
"2013-02-01" 21

query2: get the time series as it was known at a certain date:

For instance, querying with "2013-02-01", I need to get
"2013-01-01" 11
"2013-02-01" 20

Note that query1 is a the same as query2 but with date = current date

I need help to structure my documents, and as I come from a relational background, I am not sure about the implications of my structure. I have basically identified 2 possible structure, and would be happy to have some feedbacks, or suggestions of other structure.

OPTION A: Each revision in a separate document

{
  "id":"1",
  "date":"2013-01-01",
  "version_date":"2013-01-01",
  "value":10
}

{
  "id":"1",
  "date":"2013-01-01",
  "version_date":"2013-01-02",
  "value":11
}

{
  "id":"1",
  "date":"2013-02-01",
  "version_date":"2013-02-01",
  "value":20
}

{
  "id":"1",
  "date":"2013-02-01",
  "version_date":"2013-02-02",
  "value":21
}

OPTION B: One document contains all the revisions of one date

{
  "id":"1",
  "date":"2013-01-01",
  "values" : [ 
              { "version_date":"2013-01-01",
                "value":10
              },
              {
                "version_date":"2013-01-02",
                "value":11
              }
}

{
  "id":"1",
  "date":"2013-02-01",
  "values" : [ 
              { "version_date":"2013-02-01",
                "value":20
              },
              {
                "version_date":"2013-02-02",
                "value":21
              }
}

In option B, I am also concerned by the fact that it might be a bit more difficult to perform the update query as the document has a growing part, which i am not sure is very well supported by / optimised for mongodb

EDIT: I am also considering option C to speed up query1: (although it might slow down a bit the writing)

{
  "id":"1",
  "date":"2013-01-01",
  "values" : [ 
              { "version_date":"2013-01-01",
                "value":10
              },
              {
                "version_date":"2013-01-02",
                "value":11
              }
  "last_value":11
}

{
  "id":"1",
  "date":"2013-02-01",
  "values" : [ 
              { "version_date":"2013-02-01",
                "value":20
              },
              {
                "version_date":"2013-02-02",
                "value":21
              }
  "last_value":21
}
Community
  • 1
  • 1
RockScience
  • 17,932
  • 26
  • 89
  • 125

3 Answers3

1

As with all questions like this, you are the only person who can answer this. If you have your data - try both way do some benchmarking on real data with real queries and compare what is better. If you do not have data - try to simulate it.

Keep in mind that with option B and C you have to be aware of 16 Mb limit per document. So if you have a lot of versions - you might reach the limit (but you have to understand that a there should be too many versions to reach 16Mb). Also keep in mind that updating such documents can and up with many moves on the disk.

Option B and C would be nice if you would need to select all revisions of a particular document at once, but I have not found this in your most often queries. Keep in mind that with right indexes you can achieve this as well with option A.

Salvador Dali
  • 214,103
  • 147
  • 703
  • 753
  • Thanks for the input, I will keep in mind this limitation of 16Mb per document - should be fine in my case. Indeed the most frequent query is the other way round: get the entire time series for one version. – RockScience Oct 31 '13 at 09:33
  • 1
    If this is the most frequent data, then option B/C might perform well. But I have to highlight it one more time: do benchmarking with your own data and only after this select a suitable option – Salvador Dali Oct 31 '13 at 09:49
1

There's actually a very recent blog post on the official page covering this topic: http://blog.mongodb.org/post/65517193370/schema-design-for-time-series-data-in-mongodb Take a look at that and ask any additional questions if required.

xeraa
  • 10,456
  • 3
  • 33
  • 66
0

Considering the above mentioned Options, and your requirements, it would be best to create your structure based on date , like you mentioned in Option-B.Also it would be nice if your date is indexed. Some scenarios (easy reads,updates) that show why this seems to be the proper optimized solution are:

  1. Retrieving all the versions of a particular date.
  2. Retrieve all versions of a certain period of time(that is range, e.g from jan 2012 to feb 2012)
  3. Inserting a new version, you just have to use $push.
  4. Removing an old version , just with a simple query using $pull.
Jhanvi
  • 5,069
  • 8
  • 32
  • 41
  • ok, thank you for your answer. is it easy to perform query2: get the version of a certain date (ie: value for max version_date for all date)? Or do I have to get all and perform this subset in JSON in my own programing language – RockScience Oct 31 '13 at 09:14
  • @RockScience , no logic in programming language is required, you just have to use [$max](http://docs.mongodb.org/manual/reference/operator/meta/max/) correctly and it will be done. – Jhanvi Oct 31 '13 at 09:18
  • @RockScience yes, just also refer [$elemMatch](http://docs.mongodb.org/manual/reference/operator/query/elemMatch/) , it will be helpful for your structure. – Jhanvi Oct 31 '13 at 09:21