Android (distributed application) primary key strategy

Question

I am going to implement a distributed application with multiple mobile clients and a web based server application. So each client and also the server are allowed to generate table entries. Therefore I need unique primary keys over all participants AND I want to be able to generate keys offline.

What is the best approach for generating primary keys you are using on distributed environments? For a similar question see What is the best primary key strategy for an online/offline multi-client mobile application with SQLite and Azure SQL database as the central store?

I am aware that UUID key generation is a good approach for that scenario, but I want to stick to a key with name _id and type long as suggested by the Android platform.

I don't want to have a composite id with device (also server is a device) id and local id. This approach wouldn't work well anyway since the server should be able to generate entries for a certain client. In that case I would have to use the device id also on the server.

Therefore my current favorite is to build my key with datatype long (I did this before in another project). I think I will use the high/low approach (see for example here What's the Hi/Lo algorithm?) and have a key which consists of:

client id (e.g. ~28 bits) generated from the server
low value (e.g. ~ 4 bits) incremented on client, never persisted
high value (e.g. ~ 32 bits) incremented on client, persisted on client only

The client id must be fetched from the server at first start of the mobile application. So the first start needs a network connection. This might be a downside of this approach. When having the client id on the device I can generate keys without a network connection.

Normally the high id is a unique value over the database. When a user deinstalls the application and installs it again I have to treat him as a new client and have to give him a new client id. Otherwise I would have to save the current high id on the server to be able to restore it on loss or on reinstallation - not worth the effort.

What is the best approach for getting the high id on Android? An autoincrement key is not a solution. I need something like a generator function. And it has to be executed inside its own transaction (not the "user" transaction). Has anyone experiences with that approach on Android and can anyone point me in the right direction? (I only found this answer).

What key strategy are you using for your multi client application (online and offline)?

score 3 · Answer 1 · answered Jun 03 '14 at 16:12

I offered two bounties on this question and didn't find the answer I am looking for. But I spent some time on thinking about the best solution and maybe the question was not open enough and focused to much on the solution I had in mind.

However there are a lot of different strategies available, now (after the second bounty) I think the first question to answer is which data model(s) do you have in your distributed environment? You might have

the same (or a subset) data model on client and server
differnet client data model and server data model

If you answer with 1) then you can choose for your key strategy from

using GUID
using my approach High/Low
mapping keys as @user3603546 suggested

If you answer with 2) then only the following comes in my mind

composite id

I never liked composite id's, but when I think about it (and don't call it composite id's anyway) then it could be a possible solution. Following I want to sketch this solution:

Glossary:

<client key> ... primary key generated at the client side, so the client chooses the implementation (long _id for Android)
<server key> ... primary key generated at the server side, so the server chooses the implementation
<client id> ... ID for identifying the client
<device id> ... ID for identifying the device, there is a 1-n relation between client and device

Solution:

Use it only if you have a client data model and a server data model
The client data model has the fields
- <client key> primary key
- <server key> nullable data field
The server data model has the fields
- <server key> as primary key
- <client key> nullable data field
- <client id> as mandatory data field to distinguish the client
When synchronizing from server to client, generate missing <client key> on the client and mark entry as dirty (so that the client id comes to the server at the end of the day)
When synchronizing from client to server, generate missing <server key> on the server before saving it
The mapping between client and server data model can be handled by specialised frameworks like dozer or Orika, however the key generation must be integrated when performing the mapping.

I never liked this solution because I always thought in server data model terms. I have entities which live only on the server and I always wanted to create these entities on the client which would not be possible. But when I think in client data model I might have one entity eg. Product which results in two entities (Product and a ClientProduct) on the server.

I like this answer, the only thing I don't think is necessarily true is that the server has to have a nullable client key. If objects/rows can be generated on the server it can just generate the client key, which the client can accept unconditionally on receiving back the response or when syncing. The only reason I could think of why that wouldn't work is if the client needs some format for client key, but I believe these keys should be opaque. — darvelo, May 30 '23 at 09:59

score 2 · Answer 2 · answered Apr 28 '13 at 15:07

This is more questions then answers...

It does make things easier if you can auto-generate all your id's, so you don't have to fetch them from the server and worry about whether you have a connection. You mention that you can't take the common approach (UUID or ANDROID_ID) because you will be using a long "as suggested by the Android platform".

Are you referring to the fact that Android assumes that your SQLite tables will have a long _id primary key?

Are you using a datastore or an SQL database on your server?

If you are using a datastore with hierarchical keys (e.g. google datastore) then how about if you use UUID/ANDROID_ID as client id, and then a long as data item id. Then on the client you just store the long, and on the server your entities are stored with a key path of UUID/long.

Why do you write that the "high id must be a unique value over the database"? Since it is prepended with the client id, perhaps you mean that it must be unique on the local database?

To handle your problem that the user could uninstall and reinstall the app, why not pursue your idea of "save the current high id on the server to be able to restore it on loss or on reinstallation". Since you already plan to retrieve the client id on first run (and can't assign id's until you have it) you might as well also ask the server for the next available high id.

Do your entities have some other key material such that you could generate a 32bit hash from that material for your high id? Assuming that the high id only need to be unique on a particular client (and assuming you won't have a massive # of entities on a client) then I think you would never get a collision if you have decent key material and use a hash function that minimizes collisions.

And FYI, here's a question in which someone considers the possibility of not using just a long id, in a case with some similarities to yours. http://stackoverflow.com/questions/14184861/android-use-uuid-as-primary-key-in-sqlite — Tom, Apr 28 '13 at 15:35
_id: I want to stick to the convention, otherwise I loose functionality, see http://stackoverflow.com/a/4314161/734687 — ChrLipp, Jul 01 '13 at 08:51
Server: I would like to choose an implementation where it doesn't matter: SQL database AND NoSql — ChrLipp, Jul 01 '13 at 08:52
high id must be a unique value over the database: of course only the generated id in its table must be unique, but the high id is produced normally by a generator which produces unique values for the database — ChrLipp, Jul 01 '13 at 08:54

user3603546 · Answer 3 · 2014-06-03T08:03:04.717

From my experience: use local IDs on the device and separate IDs on the server. Every time you communicate data over the wire, convert from one to the other. This will actually clarify the process and ease debugging. The conversion routines stay small, are well isolated and represent a natural element in the application architecture. The data travelling over the wire is expected to be relatively small, anyway, and ID conversion will not be a big overhead. Also, the amount of data being kept on the mobile device is, presumably, small (the bulk is on the server).

I propose to do conversion on the device with a simple table local_ID<->server_ID. The server should only provide one procedure: generate a batch of keys, say 444 new keys, which, presumably, the mobile device then will assign to its local IDs and send data to the server with server_IDs only. The conversion table can be occasionally purged of unused IDs, and local IDs can be reused, 32-bit integers will definitely suffice.

Motivation

The tables stay small, implementation stays optimal to the native device architecture, isolated from unpredictable architectural changes elsewhere and there is a nice point for debugging and tracing, through which all data passes.

I had an application regenerate all IDs on every data file save and load. It was unexpectedly simple, fast and opened up elegant other possibilities like ID-space defragmentation and consolidation.

In your case, you can change the server technology with minimal changes to the client application. Since the client can operate offline anyway, it could use only the local IDs in most functions. Only the synchronization module would fetch and convert the server-IDs.

score 1 · Answer 4 · answered Jul 02 '13 at 18:07

Let me see if I get this straight: you need a 32 bit number that's unique to the device? Ok:

Create the number either randomly or by hashing the current nanotime. That'll get you a fairly unique string as it is.
Then ask the server if that number has already been used. If it has, generate the number again and ask again.

If you hash the nanotime, it is so practically impossible (not totally impossible, collision resistance isn't collision proof) to get the same number. Given the rest of your string, that would make it totally unique. This method doesn't require interactions with the server until it actually needs to use the server. Say the client isn't connected at first start: generate the number, save it, and when it does connect, before anything else happens, check to see if the device exists. If it does, start from scratch. That way you can get a truly unique device ID.

If the client number isn't truly unique I always need a server call. It doesn't matter if I contact the server for validation or for generating my client id. In the first case I would have to correct all keys I generated in the meantime, so its easier when the server generates the id. — ChrLipp, Jul 04 '13 at 08:31

score 1 · Answer 5 · answered May 30 '14 at 15:27

1

There is no way to know for certain that they keys you are generating on the client are unique on the server DB until you communicate with the server.

If you communicate up front to the server, before creating any records on the client side, then you can reserve a range of keys on the server. For example, the server could hand out keys in batches of 10,000. The client communicates with the server, and reserves the start of the next batch of available keys, say 60,000. The client is then free to create records with ids from 60,000 to 69,999. Once the client runs out of keys, it needs to request a new range of keys. If all the clients and the server reserve keys for themselves like this, then all generated ids will be unique in the server's database.

Now if you create records on the client side before communicating with the server, then you would be stuck having to correct those ids once you get a reserved range from the server so that they are within that range, before you sync those records to the server.

I'm not sure why you are also trying to include a client id in the key; the server is assigning the high value, and this is enough to get unique keys generated on the client.

answered May 30 '14 at 15:27

Doug Simonton

1,021
8
14

You are suggesting Hi/Low with High ID generated at the server. That`s why I add a client ID: it allows me to generate the high ID on the client. – ChrLipp Jun 01 '14 at 18:42
Then the way you are implementing it, your client id is really the high id - it defines a server-assigned a range of ids that will be reserved to be created by your client. And your high id is really the low id - you increment through these values until you run out of ids that can be created within your client range. – Doug Simonton Jun 02 '14 at 06:30
No, in the High/Low approach the low id is meant to prevent database access and therefor to speed up key generation. When you need your first ID, you fetch a high id and then you can generate <2^low id> keys without accessing the database. – ChrLipp Jun 02 '14 at 06:33
However, I do not stick to the High/Low approach. I just want to learn the best key strategy for distributed mobile environments. – ChrLipp Jun 02 '14 at 06:35
Right, so I don't get why you have 3 different components. Why do you need to loop through high ids on the client side without communicating with the server? I think you have 3 components in your scheme (client id from the server, high and low) and you only need 2 to achieve what you are trying to achieve (high from the server and low). – Doug Simonton Jun 02 '14 at 06:36
The reason is performance. Each high id increment must be persisted, normally in its own transaction. The low id (in my example with 4 bits) allows me to hit the database for a new high id only every 16 ids. – ChrLipp Jun 02 '14 at 07:28
So why not just have a client id of 28 bits assigned by the server one time at first connection. The server hands out client ids sequentially (1 << 28, 2 << 28, 3 << 28). Then the remaining lower 36 bits can be combined with the client id to generate keys: Client_id | 1, Client_id | 2, Client_id | 3. You will not have a key conflict with other clients, because the client id is part of the key, and there is no need to break down the remaining 36 bits into high/low or persist them on the server at any point, because ALL of those 36 under that client id bits belong to that client only. – Doug Simonton Jun 02 '14 at 08:14

Android (distributed application) primary key strategy

5 Answers5

Motivation

Linked