0

I'm working on a ASP.Net app, that is not really a distributed application per se, but at some point will have all of its data synchronized to the master node.

In order to be able to store the data from different nodes on one unique table without having colision of ids, the approach that was taken was:

  • I. Not use auto generated ids
  • II. The row Id would be composed by a concatenation of the NodeId + NextRowtId

The NextRowId is generated by:

  1. Selecting the highest id from one specific node,
  2. Splitting it into 2 parts, the first part being the NodeId and the second the being the LastDocumentId
  3. Incrementing the LastDocumentId
  4. Concatenate the NodeId with the incremented LastDocumentId

Eg
Id = 20099, split into (NodeId = 200, LastDocumentId = 99)

LastDocumentId + 1 = 100

NextRowId = 200100

This works perfectly in theory, or if the requests are processed in a sequential way. However, if multiple requests are processed at same time they often end up generating the same id.

So in practice if multiple there is a collision of ids when multiple users try to update the same table at the same time.

Id colision in case of concurrent use

I have had a look at the best practices on generating unique ids for distributed systems. However, none of them is a viable option at this point in time, as they would require a rethinking of the whole architecture and lots and lots of refactoring. Both require time which management will not allow me to take.

So what are the other ways that I can ensure that ids generated are unique or that the requests are processed in a sequential way? All this, ideally without having to restructure the application or cause performance bottlenecks.

Ferox
  • 461
  • 4
  • 13
  • Do you have a requirement for those ids to be sequential without gaps? If not - just use sql server sequence. – Evk May 05 '18 at 16:06

1 Answers1

2

Create a unique constraint on your key column. If you happen to insert the same id twice, catch the exception and regenerate your id.

You probably want to use Guids instead.

That said, if you need to know to which node your data is associated, you should model your database according to it: Have 2 columns NodeId and DocumentId. You can also generate a Unique Constraint above multiple columns.

Christian Gollhardt
  • 16,510
  • 17
  • 74
  • 111
  • 1
    I would advise against GUIDs. It's better to use [Sequence](https://learn.microsoft.com/en-us/sql/t-sql/statements/create-sequence-transact-sql) or just `identity`. However, I don't know whether EF supports Sequence. – JohnyL May 05 '18 at 16:27
  • I would advise for GUIDs. Each node can generate a GUID for each record that it has, safe in the knowledge that the identifier _will not_ conflict with anything else in the system. Even if the data synchronisation to the master node takes place a long time after creation of the entity, the node will not have to worry about the master node rejecting the generated id. It also eliminates the need to read back the `identity integer` from the database after creating each record before creating any children. – Trevor May 05 '18 at 23:13
  • @Christian-Gollhardt thank you very much for your answer. We did use the unique constraint, but instead of going with the catch resave approach we opted to used sequences as suggested by JohnyL. It will require more work but it ensures that each DocumentId is unique and eliminates the risk of colision. – Ferox May 07 '18 at 08:33
  • @JohnyL I did nto find an straight forward way to use sequences with EF. But you can make use of it if you use EF raw queries. – Ferox May 07 '18 at 08:38
  • @Ferox Yes, this would be last resort, but would be nice to have it "in-box" :) – JohnyL May 07 '18 at 08:45
  • @Trevor the collision chance is relative small. [You have 5 quadrilion Guids per Star in the universe.](https://stackoverflow.com/a/2977648/2441442). If a collision occur, I wouldn't even catch the Exception and simple rollback at highest level. – Christian Gollhardt May 07 '18 at 13:01