1

tl;dr

How do I best manage safe insertion of timeseries delta documents? Enforced on the DB side: code could break and checking documents already in the DB takes long. Should I consider changing schema (I need to perform time-wise slice queries)?

detail

I have stock price in a time series (here modelled with a TS collection):

# mongosh
db.stocks.insertMany([{
   "metadata": [{"instrumentId": 5578}, {"FX": "USD"}],
   "timestamp": ISODate("2021-05-18T00:00:00.000Z"),
   "price": 12.5
}, {
   "metadata": [{"instrumentId": 5578}, {"FX": "USD"}],
   "timestamp": ISODate("2021-05-18T04:00:00.000Z"),
   "price": 11.4
}])

Say I deserialize it to an object, add today's price and serialize it back. I now have

# to handle with pymongo
documents = [{
   "metadata": [{"instrumentId": 5578}, {"FX": "USD"}],
   "timestamp": ISODate("2021-05-18T00:00:00.000Z"),
   "price": 12.5
}, {
   "metadata": [{"instrumentId": 5578}, {"FX": "USD"}],
   "timestamp": ISODate("2021-05-18T04:00:00.000Z"),
   "price": 11.4
}, {
   "metadata": [{"instrumentId": 5578}, {"FX": "USD"}],
   "timestamp": ISODate("2021-05-18T08:00:00.000Z"),
   "price": 11.1
}]

Option 1: unique compound index

I insert_many them all (reversed) and let the duplicates to fail insertion. However, due to the reverse nature, the _id will not be insertion-time sorted in the case that the delta documents are more than one. So, need to call .sort({"timestamp":1}) for every query.

Moreover, if I want to use a timeseries collection (mongodb>=5.0), unique indices are not supported.

# mongosh
db.stocks.createIndex({"metadata.InstrumentId":1,"timestamp":1},{"unique":true})
>>> MongoServerError: Unique indexes are not supported on collections clustered by _id

Option 2: upsert

I re-write existing documents, and could add an update date. However, update is not supported on timeseries collections. Moreover, I have to call the update in a for loop, which should be less efficient:

for datapoint in datapoints:
    db.stocks.update_one(
        {
            "timestamp": ISODate("2021-05-18T00:00:00.000Z"),
            "metadata": [{"instrumentId": 5578}, {"FX": "USD"}, {"lastUpdate": ISODate(now)}]
        },
        {"$set":{
                "price": 12.5
            }
        },
        {"upsert":True})

At the moment my best guess is option 1 with a normal collection.

Better ideas?

C. Claudio
  • 177
  • 13
  • `_id` is a unique timestamp field (see [ObjectId.getTimestamp()](https://docs.mongodb.com/manual/reference/method/ObjectId.getTimestamp/), why not use that? I'm assuming your `timestamp` field is set to "now". – Paul Aug 14 '21 at 18:44
  • Right, I forgot to say that `timestamp` is not "now", prices can be written async from current time. – C. Claudio Aug 14 '21 at 18:53
  • An improvement to option 1 can be `my_collection.insert_many(to_insert, ordered=False)` as [here](https://stackoverflow.com/a/44614420/9671120). – C. Claudio Aug 14 '21 at 19:20
  • 1
    Depending on how many documents you are writing/updating, [bulkWrite()](https://docs.mongodb.com/manual/reference/method/db.collection.bulkWrite/) may be worth a look. – Paul Aug 14 '21 at 19:26

1 Answers1

0

The best answers I could come up with is my_collection.insert_many(to_insert, ordered=False) or directly call BulkWrite as suggested by Paul (still, unordered upsert).

C. Claudio
  • 177
  • 13