2

Have been reading a few blogs related to this topic like https://www.mongodb.com/blog/post/time-series-data-and-mongodb-part-1-introduction. It seems storing related time-series data within a few (or single document) would be a better approach over storing each data point as a single document. But I am just thinking if storing it in a single doc (just forget the size bucket approach for a moment) fits my use case a) Regarding update of the data: occasionally need to replace the entire time-series b) Potentially need to read the sorted [<date1,value1>,<date2,value2>,....] by data range.

A few questions on my head right now.

1) The common suggestion I saw is don't embed a large array in a single doc if the size of the array is unbound. If the array size is unbound, mongoDB may need to reallocate a new space upon update. I understand if we are using the old storage engine MMAPv1 that would be an issue. But as I saw from another answer in WiredTiger and in-place updates and Does performing a partial update on a MongoDb document in WiredTiger provide any advantage over a full document update?, this doesn't seem a problem because in WiredTiger we will construct the entire docs anyway and flush it to the disk. And given its optimization towards large docs. I am just wondering if large array in a single doc would still be an issue with WiredTiger.

2) [Given each array may contain 5000+ key value pairs, each doc is around 0.5 - 1 MB] Seems like the query to retrieve the sorted time-series by date range would be less complicated (less aggregation pipelines involved as I need to unwind the subdocument to do sorting and filter) if I store each data point as a single doc. But in terms of disk and memory usage, there would definitely be adv. as I am retrieving 1 docs vs n docs. So just thinking how to draw a line here in terms of performance.

3) There are 3 approaches I could go for in scenario A.

  • Replace the whole docs (replaceOne/ replaceOneModel)
  • Update only the time-series part of the doc (updateOne/ updateOneModel)
  • Use bulk update or insert to update each array element

Also I am thinking about the index rebuild issues. Given the index is not linked directly to the data files in WireTiger, it seems approach 1 and 2 are both acceptable, with 2 being better as it updates only the target part. Am I missing anything here?

4) Just thinking if what are the scenarios that we should actually go for the single data point per doc approach.

I am not too familiar with how the storage engine or mongoDB itself work under the hood, would be appreciate if someone could shed some lights on this. Thanks in advance.

Isaac Wong
  • 288
  • 3
  • 10

0 Answers0