database design for distributed application

Question

What is the best way to have a distributed, not-always-connected database across multiple clients and a server environment?

I am building a multi-tenant web app, that also has a detached Windows client. Entities in the system include contacts, tasks, activities, etc. A lot of these entities are created at the client end, while it is disconnected from the server. A client may never connect to the server if it operates in standalone mode (which is the current version).

Now I want to make sure that when an entity is created at the client, it does not create a conflict at the server end. This appears to be a perfect job for using GUIDs as primary keys, and while it is not a problem at the client end where the number of entities may be small, I am concerned that it will potentially be an issue at the server end where the random nature of the GUIDs will make lookups very inefficient.

Additionally, at what point (ie which table) do you no longer need to use GUIDs at primary keys - for example say a task has a series of notes (1:N) - there's nothing specifically special about notes, they just need to be accessible - is there an efficient way of keying these sorts of entities?

What kind of multi-tenant architecture are you using? Do the users share tables? That is, do the tables store rows from multiple users? — Mike Sherrill 'Cat Recall', Oct 14 '12 at 19:31
Hi Catcall - that is the intention. ie each table stores all customer's data — Marcin, Oct 16 '12 at 14:27

score 3 · Accepted Answer · answered Oct 12 '12 at 18:08

3

When you use GUIDs, performance hit will not be on lookups, only when inserting new data into indexes. And that also depends on how your dbms treats GUID values.

I would recommend using GUIDs anyway, at least as a transfer ID.

answered Oct 12 '12 at 18:08

ali köksal

1,217
1
12
19

Thanks for the pointer re: where the performance hit is. What do you mean by 'transfer ID' - is that an id that is independent of the client and the server? IF that's the case, are you suggesting that each table has a (indexed) transfer ID column for the purpose of synchronisation? – Marcin Oct 16 '12 at 14:31
You can either use guids as primary keys / identity columns for each of your tables and use the same id values for client and server databases. Or, if you have master-detail tables, use a guid column (to be used as transfer key) in each of your master tables, move your detail tables with master records, and use different id values on server and clent databases. I would recommend the first solution. The second solution needs you to move your records in logical units and does not cope well with future changes. – ali köksal Oct 16 '12 at 16:16

score 1 · Answer 2 · answered Oct 16 '12 at 15:00

Now I want to make sure that when an entity is created at the client, it does not create a conflict at the server end. This appears to be a perfect job for using GUIDs as primary keys

A GUID will guarantee that each row is unique. But unique rows isn't usually the real problem.

If the tenant is a company, the company's id number will appear in each row. If it's possible for two users from the same company to insert a row that differs only in the GUID, you still have to deal with potential conflicts.

For example, imagine a company that has two users entering data in a table of states.

              guid    employer_id  state_code  state_name
              --
user1 enters  guid-1  12345        AL          Alabama
user2 enters  guid-2  12345        AL          Alabama

Data integrity requires more constraints that just the primary key constraint on the GUID column. (Especially if disconnected clients are entering data in other tables related by foreign key references.)

You don't want the GUID to be a clustered key. Some dbms will by default make a primary key the clustered key. If your dbms does that, it probably has a way to tell it not to, maybe by a declaration like your_column_name guid primary key nonclustered.

Thanks - I don't think that the scenario you mention will be an issue in my particular application but I will keep an eye on it. On the clustered guids issue - thanks for that heads up - I've searched on that issue (eg http://stackoverflow.com/questions/583001/improving-performance-of-cluster-index-guid-primary-key) and now I know! — Marcin, Oct 18 '12 at 01:26
It's true in general; it's just easier to illustrate the way I did it. Data integrity requires unique constraints on data that's important to the user's business--'state_code' and 'state_name' above. ('guid' and 'employer_id' are important to *your* business, not to the user's business.) — Mike Sherrill 'Cat Recall', Oct 18 '12 at 04:21

score -2 · Answer 3 · answered Oct 12 '12 at 07:59

-2

OK. Just to be sure, I'm assuming you don't really mean "entities" when you say clients create them. I'm guessing you mean rows in existing tables. Just shout if that's not true.

In any case, the problem here is unique naming. For your case, prefix all rows with the userid. As long as users are unique, the records will not clash.

Your remaining concern will now be if records should clash. That is generally not a problem that can be automated away, and is possibly beyond the scope of the question.

answered Oct 12 '12 at 07:59

emperorz

429
3
9

Yeah, sorry, yes I mean rows of a table. That's a pretty good explanation - so basically you're saying, at the client end just use int, and at the server end we just prefix the userid to all the primary keys? – Marcin Oct 12 '12 at 12:33
What if a user installs the windows client twice? – ali köksal Oct 12 '12 at 18:11
@Marcin - yes, though it may be easier to have the userid in both locations so they remain exactly the same. That's totally up to you. – emperorz Oct 12 '12 at 18:58
@alikox - I would only allow one client to exist at one time per user. Failing that, a "clientid" may be the way to go. It would be fast and can be short too. GUIDs tend to be long. As ever, try it! See which works best for the particular situation. – emperorz Oct 13 '12 at 07:11
Alikox makes a good point - eg having a windows client and an ipad client, and the like. Or if they use 2 separate computers with the same account linked to them. – Marcin Oct 16 '12 at 14:32
@Marcin - If GUID's feel most appropriate, then go for it, though I'm not sure how you'll retrieve the records or avoid logical clashes - ref Catcall's answer above. Anyway, given the voting, I guess you've made your mind up already. Good luck! – emperorz Oct 20 '12 at 10:08

database design for distributed application

3 Answers3