What is the best primary key strategy for an online/offline multi-client mobile application with SQLite and Azure SQL database as the central store?

Question

What primary key strategy would be best to use for a relational database model given the following?

tens of thousands of users
multiple clients per user (phone, tablet, desktop)
millions of rows per table (continually growing)

Azure SQL will be the central data store which will be exposed via Web API. The clients will include a web application and a number of native apps including iOS, Android, Mac, Windows 8, etc. The web application will require an “always on” connection and will not have a local data store but will instead retrieve and update via the api - think CRUD via RESTful API.

All other clients (phone, tablet, desktop) will have a local db (SQLite). On first use of this type of client the user must authenticate and sync. Once authenticated and synced, these clients can operate in an offline mode (creating, deleting and updating records in the local SQLite db). These changes will eventually sync with the Azure backend.

The distributed nature of the databases leaves us with a primary key problem and the reason for asking this question.

Here is what we have considered thus far:

GUID

Each client creates it’s own keys. On sync, there is a very small chance for a duplicate key but we would need to account for it by writing functionality into each client to update all relationships with a new key. GUIDs are big and when multiple foreign keys per table are considered, storage may become an issue over time. Likely the biggest problem is the random nature of GUIDs which means that they can not (or should not) be used as the clustered index due to fragmentation. This means we would need to create a clustered index (perhaps arbitrary) for each table.

Identity

Each client creates it’s own primary keys. On sync, these keys are replaced with server generated keys. This adds additional complexity to the syncing process and forces each client to “fix” their keys including all foreign keys on related tables.

Composite

Each client is assigned a client id on first sync. This client id is used in conjunction with a local auto-incrementing id as a composite primary key for each table. This composite key will be unique so there should be no conflicts on sync but it does mean that most tables will require a composite primary key. Performance and query complexity is the concern here.

HiLo (Merged Composite)

Like the composite approach, each client is assigned a client id (int32) on the first sync The client id is merged with a unique local id (int32) into a single column to make an application wide unique id (int64). This should result in no conflicts during sync. While there is more order to these keys vs GUIDs since the ids generated by each client are sequential, there will be thousands of unique client-ids, so do we still run the risk of fragmentation on our clustered index?

Are we overlooking something? Are there any other approaches worth investigating? A discussion of the pros and cons of each approach would be quite helpful.

Ultimately, the "best" strategy is dependent upon your other needs. These are all very valid options. Perhaps you should rephrase your question as, "In what scenarios do each of these primary key strategies make the most sense?" — Jaxidian, Apr 22 '13 at 18:08
I know that for our similar database/app structure we have users sign up with an account first, and we base all of their data storage on that account. Dealing with the account first also handles all the legal obligations and the rights for storing the information. — HalR, Apr 23 '13 at 00:50
@Jaxidian Although the question you propose would make for an interesting discussion, I am looking for the a solution that would be most appropriate for our case as outlined in the question. — user1843640, Apr 23 '13 at 17:40
@HalR We can't use the account id because a user connecting with multiple devices could cause a conflict. This is why we are considering a client/device id as part of the primary key. — user1843640, Apr 23 '13 at 17:43
One of the reasons we use accountID is so that users can sync on multiple devices and have the same info on all of them. — HalR, Apr 23 '13 at 17:47
@user1843640 Unfortunately, we cannot know which of these is best for you. Some people have policy or technical reasons against using composite primary keys - you've given us no indication of that. Some people love using GUIDs as primary keys because the fragmentation is unimportant while others cannot do such a thing. For some systems, if the data is one-way, then your identity solution is great but with two-way data, this gets messy. We simply don't have enough details to analyze each (or any) of these scenarios for you because we don't know what compromises can be or have been made. — Jaxidian, Apr 23 '13 at 20:35
You might be interested in this question: http://stackoverflow.com/questions/16263250/android-or-distributed-application-primary-key-strategy, I offered a bounty for it. — ChrLipp, Jul 01 '13 at 08:56
And now I offered a second bounty, but this is the last time ;-) — ChrLipp, Jun 03 '14 at 20:55
Have a look on the chapter "Primary and other keys" of my book "[Programming with databases](https://www.amazon.com/dp/2956300806)" this topic is considered there. — serge, Feb 26 '19 at 12:47

score 2 · Answer 1 · edited Jun 20 '20 at 09:12

I've considered this question at length came to the decision that a GUID is usually the best solution. Here's a little information on why:

Identity

The Identity option sounds like it removes all the negatives, but having implemented a Single Page Web App that implemented this system, I can tell you it adds a significant amount of complexity to the code. A temporary id can spread through your client side data quite quickly, and it's really hard to create a system that has no holes in it when it comes to finding every single possible usage. It usually leads to application and data specific hard-coded information to track foreign keys on the client (which is tedious and error prone as the database changes and you forget to update this information). It also adds a lot of overhead to every sync, as it might have to run through multiple tables each sync to check for temporary ids. There might be a better way to implement this system, but I haven't seen a good approach that doesn't add a ton of complexity and possible ugly error states in your data.

Composite

The composite approaches also add a lot of complexity to your code in generating session ids and creating ids from them, and they don't really offer any advantages over GUIDs other than you can guarantee that it's unique - but the thing is, a GUID is theoretically unique, and while I was scared of the fact that there is a possibility of repeats, I realized that it was an infinitesimally small chance and there's actually a really easy method to handle the small possibility that it's not unique.

GUIDs

My biggest worries about using a GUID were

they have a large size and aren't traditional ints, which will make transferring large bits of data slower and degrade database performance
if you actually ever do run into a conflict, it can ruin your app, so you have to write complex code to handle a situation you will probably never use.

Then I realized that in an offline style web app, you're not usually transferring large amounts of data at once because it's all stored on the client.

You also don't worry about server database performance much either because that's done behind the scenes in a sync - you just worry about client side data performance.

Last, I realized that handling a conflict is really a trivial thing. Just test for a conflict and if you get one, create a new GUID on the server and continue with the operation. Then send a message back to the client that causes the client to throw up a little error message and then deletes all client side data and re-downloads it fresh from the server. This is really quick and easy to implement, and you probably already want this as a possible operation on an offline web app anyway. While it might sound inconvenient for the user, the likelihood of the user ever seeing this error is almost 0%.

Conclusion

In the end, I think for this type of app, GUID's are the easiest to implement and work the best with the least possibility for error and without creating overly complex code.

If your application doesn't have to run offline, but you have a client-side database for performance or other reasons, you can also consider throwing up a loading gif and pausing client side execution until the id is returned via ajax from the server.

score 1 · Answer 2 · answered Apr 22 '13 at 18:15

The key (pun intended) thing to remember is to simply have a unique key for each object you are storing on the persistent store. How you handle the storage of that object is completely up to you and up to the methodology of how you access that key. Each of the strategies you listed have their own reasons for why they do what they do but in the end they are storing a key for a certain object in the db so all of its attributes can be changed while retaining the same object reference in the database.

What is the best primary key strategy for an online/offline multi-client mobile application with SQLite and Azure SQL database as the central store?

2 Answers2

Identity

Composite

GUIDs

Conclusion

Linked