0

problem

I have a main register and x sub-registers which stream data to the main register. I sync these registers during the night with the main register. So during the daytime there will be created data in x1,x2,...xn and at night all the data is sent to the main register.

It is not very probable but it could be that for example x1 and x2 generate the same _id. Now if I sync them the first register would create the document and the second register would upsert (and thus overwriting) my document.

solutions

current

To prevent that I currently save the original _id under a field refId this is very awful because I can't use all the references and most of the functions on my main register.

option I - autoincrement

I planned to switch to autoincrement _ids but I read that this is something you shouldn't do if you plan to scale:

https://blog.serverdensity.com/switching-to-mongodb-and-auto-increment/ https://docs.mongodb.org/manual/tutorial/create-an-auto-incrementing-field/

option II - modifying the sub-register _id

I'm already saving from which subregister my document came from. Thus I could check if an _id exists but has another sub-register. If so I could generate a new _id for the new document and give back the new _id to my sub-register. This sounds easy at a first glance but I do have a lot of references in my documents.

Question)

What is a good solution to handle this issue with possible duplicate ids? is there an easier but effective way to solve this? E.g. prefixing the 24 digits ObjectId automatically.

Community
  • 1
  • 1
Andi Giga
  • 3,744
  • 9
  • 38
  • 68
  • Have you observed any duplicate ObjectIds. According to the answer on this post it is going to be highly unlikely this will be the case: http://stackoverflow.com/questions/4677237/possibility-of-duplicate-mongo-objectids-being-generated-in-two-different-colle – Alex Dec 07 '15 at 13:09

2 Answers2

1

As it is unlikely (although not impossible) that two identical ObjectIds are generated in the different sub registers, you could handle duplicate ObjectIds in your error handler. You can generate a new ObjectId if there is a unique key violation and there is a discrepancy in source register. As the error handler will not be called frequently, you can add more complex logic without affecting the overall performance.

You could, for example, generate a new ObjectId by cloning the source record and delete the original, or, alternatively you can create your own ObjectId and replace the machineId with a registerId.

Community
  • 1
  • 1
Alex
  • 21,273
  • 10
  • 61
  • 73
  • Ok cool that would be option II. In case of identical objects I do change the original _id? Or easier send back some error msg instead of saving and implement the logic then. I didn't know that the Id is generated that smart. – Andi Giga Dec 07 '15 at 13:30
  • Yes, the simplest way to resolve the conflict, is to clone the source record and delete the original (you can't update the ObjectId). – Alex Dec 07 '15 at 13:38
  • If I have this two `_id`s: `555d8f f842 e7470b 447d8028` & `555d8f f842 e7470b 447d8029` I can say that `447d8029` is the timestamp, `e7470b` is the machine, `f842` is the process and `555d8f` is the counter. https://docs.mongodb.org/manual/reference/object-id/#ObjectIDs-BSONObjectIDSpecification, when exactly is the counter modified? It seems equal on both entries. – Andi Giga Dec 07 '15 at 13:38
  • Is there a way to go through all collections at once and update every reference to that _id ? Or will I have to search through each collection and modify it with e.g. `findAndUpdate`. Because I got 6 collections which all have references to each other. So I would have to do it for every reference in every collection. – Andi Giga Dec 07 '15 at 13:45
  • You will to search through each collection and update the id. It is slow, but it should not be called very often. – Alex Dec 07 '15 at 13:55
  • Just to ask if I have full control over all instances of my application, I probably could modify the processId and give each sub-register a defined processId than I would avoid collusions for sure. Is there a method (because I didn't find one) to set up a process id/machine id? Otherise I may could do this: `var user = new UserModel(); var id = user._id; user._id = id.slice(0,6) + "" + id.slice(10,24);` I could define the `` as `process.env.INSTANCEID` And write the method into `schema.pre('save', function(next) {` – Andi Giga Dec 07 '15 at 15:00
  • Creating a custom ObjectId would work. I don't think you can set a defined processId. – Alex Dec 07 '15 at 15:14
0

The final solution is creating a custom mongoose ObjectId. As mentioned in this article https://docs.mongodb.org/manual/reference/object-id/#ObjectIDs-BSONObjectIDSpecification, the mongoose _id consists of several parts which make it very unlikely that two _idcollide even if they are generated on different machines. To have more control over this process I take control over the processId section in the _id. Each subregister gets its own processID.

To achieve that I generated two functions:

1) Check if an override value for the _id is available, valid and if the register is a local one (no modifications required on the main register). Entries are only generated on my local register.

2) Modify the _id

mongoose = require('mongoose')
ObjectId = mongoose.Types.ObjectId
registerType = require('./register-config.js').getType()#function gives back local/national

exports.useProcessId = ()->
  return process.env.REGISTER_PROCESS_ID? && process.env.REGISTER_PROCESS_ID.length == 4 && registerType == 'local'

exports.changeMongooseId = (data, next) ->
  id = data._id.toString()
  data._id = new ObjectId(id.slice(0,6) + process.env.REGISTER_PROCESS_ID + id.slice(10,24))
  return next()

This functions are called before I save a new document to the collection:

instituteSchema.pre('save', (next) ->
  data = @
  async.parallel
    eprd: (next)->
      validateEPRD(data, next)
    changeMongooseId: (next)->
      if useProcessId then processIdConfig.changeMongooseId(data, next) else return next()
    (err)->
      return next new Error(err) if err?
      return next()
)

I don't need to call the function on updates etc. because at this time it already has the unique _id.

Andi Giga
  • 3,744
  • 9
  • 38
  • 68