Have been reading a few blogs related to this topic like https://www.mongodb.com/blog/post/time-series-data-and-mongodb-part-1-introduction. It seems storing related time-series data within a few (or single document) would be a better approach over storing each data point as a single document. But I am just thinking if storing it in a single doc (just forget the size bucket approach for a moment) fits my use case
a) Regarding update of the data: occasionally need to replace the entire time-series
b) Potentially need to read the sorted [<date1,value1>,<date2,value2>,....]
by data range.
A few questions on my head right now.
1) The common suggestion I saw is don't embed a large array in a single doc if the size of the array is unbound. If the array size is unbound, mongoDB may need to reallocate a new space upon update. I understand if we are using the old storage engine MMAPv1
that would be an issue. But as I saw from another answer in WiredTiger and in-place updates and Does performing a partial update on a MongoDb document in WiredTiger provide any advantage over a full document update?, this doesn't seem a problem because in WiredTiger we will construct the entire docs anyway and flush it to the disk. And given its optimization towards large docs. I am just wondering if large array in a single doc would still be an issue with WiredTiger.
2) [Given each array may contain 5000+ key value pairs, each doc is around 0.5 - 1 MB] Seems like the query to retrieve the sorted time-series by date range would be less complicated (less aggregation pipelines involved as I need to unwind the subdocument to do sorting and filter) if I store each data point as a single doc. But in terms of disk and memory usage, there would definitely be adv. as I am retrieving 1 docs vs n docs. So just thinking how to draw a line here in terms of performance.
3) There are 3 approaches I could go for in scenario A.
- Replace the whole docs (replaceOne/ replaceOneModel)
- Update only the time-series part of the doc (updateOne/ updateOneModel)
- Use bulk update or insert to update each array element
Also I am thinking about the index rebuild issues. Given the index is not linked directly to the data files in WireTiger, it seems approach 1 and 2 are both acceptable, with 2 being better as it updates only the target part. Am I missing anything here?
4) Just thinking if what are the scenarios that we should actually go for the single data point per doc approach.
I am not too familiar with how the storage engine or mongoDB itself work under the hood, would be appreciate if someone could shed some lights on this. Thanks in advance.