tl;dr
How do I best manage safe insertion of timeseries delta documents? Enforced on the DB side: code could break and checking documents already in the DB takes long. Should I consider changing schema (I need to perform time-wise slice queries)?
detail
I have stock price in a time series (here modelled with a TS collection):
# mongosh
db.stocks.insertMany([{
"metadata": [{"instrumentId": 5578}, {"FX": "USD"}],
"timestamp": ISODate("2021-05-18T00:00:00.000Z"),
"price": 12.5
}, {
"metadata": [{"instrumentId": 5578}, {"FX": "USD"}],
"timestamp": ISODate("2021-05-18T04:00:00.000Z"),
"price": 11.4
}])
Say I deserialize it to an object, add today's price and serialize it back. I now have
# to handle with pymongo
documents = [{
"metadata": [{"instrumentId": 5578}, {"FX": "USD"}],
"timestamp": ISODate("2021-05-18T00:00:00.000Z"),
"price": 12.5
}, {
"metadata": [{"instrumentId": 5578}, {"FX": "USD"}],
"timestamp": ISODate("2021-05-18T04:00:00.000Z"),
"price": 11.4
}, {
"metadata": [{"instrumentId": 5578}, {"FX": "USD"}],
"timestamp": ISODate("2021-05-18T08:00:00.000Z"),
"price": 11.1
}]
Option 1: unique compound index
I insert_many them all (reversed
) and let the duplicates to fail insertion. However, due to the reverse nature, the _id will not be insertion-time sorted in the case that the delta documents are more than one. So, need to call .sort({"timestamp":1})
for every query.
Moreover, if I want to use a timeseries collection (mongodb>=5.0), unique indices are not supported.
# mongosh
db.stocks.createIndex({"metadata.InstrumentId":1,"timestamp":1},{"unique":true})
>>> MongoServerError: Unique indexes are not supported on collections clustered by _id
Option 2: upsert
I re-write existing documents, and could add an update date. However, update is not supported on timeseries collections. Moreover, I have to call the update in a for loop, which should be less efficient:
for datapoint in datapoints:
db.stocks.update_one(
{
"timestamp": ISODate("2021-05-18T00:00:00.000Z"),
"metadata": [{"instrumentId": 5578}, {"FX": "USD"}, {"lastUpdate": ISODate(now)}]
},
{"$set":{
"price": 12.5
}
},
{"upsert":True})
At the moment my best guess is option 1 with a normal collection.
Better ideas?