What is the convention for ensuring unique ID's remain unique, once uploading to a cloud database

Question

Say 2 users have an entry on their local device, but by some chance, their entry had the same uuid generated. What should happen when both users they try to upload them to a central data base

Is it normal to just "re-id" one of the entries?

The database should generate its own unique ids in this case. Having users create their own "unique" id isn't really going to work. — Gordon Linoff, Nov 11 '19 at 16:55
@GordonLinoff Sorry I should've specified. The user's aren't creating their own id's, but if my offline device creates the same ID that your offline device created, and we both try to upload them, what happens? — microflakes, Nov 11 '19 at 16:56
Look for GUID, that's what you are looking for. [Read this](https://stackoverflow.com/questions/18954130/can-we-use-guid-as-a-primary-key-in-sqlite-database) — Antonio Veneroso Contreras, Nov 11 '19 at 17:02
Im aware, but is the likely-hood of a duplicate so negligible that my application can just ignore it? — microflakes, Nov 11 '19 at 17:04
Depending on data volume and number of clients, you can assign enough a range of, say, 1 million rowids to use... but, yeah, it's better to create new ones when uploading data. — Shawn, Nov 11 '19 at 17:21
Look at it this way @microflakes - the chance of a GUID collision is much less than the chance of you winning the lottery and not having to worry about it any more. So I would ignore it. — TomC, Nov 11 '19 at 23:31
Just to clarify: Are you having each local device generate unique IDs, or will the cloud database generate unique IDs? Will each local device generate a unique ID for each document it creates? — Peter O., Nov 12 '19 at 03:07
I was having the devices create them. I can see how having the cloud database generate them would eliminate the problem, but being able to use this app offline is an important feature, infact, uploading to a central database is a very minor feature in my case — microflakes, Nov 12 '19 at 18:58

score 1 · Accepted Answer · answered Nov 12 '19 at 03:22

The chance of two devices generating duplicate UUIDs is so small that most systems will ignore the possibility. From How unique is UUID?:

after generating 1 billion UUIDs every second for the next 100 years, the probability of creating just one duplicate would be about 50%.

This of course is dependent on the volume of data being generated by devices, if these are some types of sensor that generate huge volumes of records (e.g. on aeroplane engines) then the chance of a non-unique UUID starts to become a possibility.

If that's the case and you need to be sure of 100% uniqueness in the IDs then as others have mentioned you'll need to do one of the following:

assign each device a (large) unique range in which to generate IDs
regenerate all IDs when data is merged centrally
check all incoming IDs against existing IDs and modify any duplicates to make them unique (this could be a very expensive operation)

Could you explain how your 3rd option is more expensive than your 2nd? If I wanted to compare all incoming ID's, wouldn't a simple query just tell me if it existed? It seems like both your 2nd and 3rd options execute under the same conditions, but how is 2 < 3? — microflakes, Nov 12 '19 at 19:03
In the second option you execute an operation to create a new ID for each incoming record, the quantity of which is likely to be an order of magnitude smaller than the existing centralised data that's already been merged. In the third option you would need to scan the IDs in *all existing data* (or an index of IDs) to check whether an incoming ID is a duplicate of an existing ID. Over time this operation will become more expensive as the volume of merged data grows, depending on the volume this might eventually become prohibitively expensive. — Nathan Griffiths, Nov 12 '19 at 21:26

Peter O. · Answer 2 · 2019-11-12T15:21:00.613

It seems that you're letting multiple devices generate IDs which should be unique for the entire application.

If you can check for entry upload conflicts, and prevent the upload of an entry with an existing unique ID, then you can handle the error by generating another ID and trying again. (This may be viable especially if you can distinguish the entries by the users that are uploading those entries, and not just by their unique IDs.) If that is an option for you, then it's enough to use random numbers (such as random UUIDs) as unique entry IDs. Random IDs are also appropriate if you can tolerate the risk of generating the same identifier for different entries.

See also what I have to say about unique random identifiers.

What is the convention for ensuring unique ID's remain unique, once uploading to a cloud database

2 Answers2