How to structure/coordinate multiple databases?

Question

Imagine a large corp with dozens of companies, each with their own website and each website will have their own unique functional requirements

Most data on each website will be specific to that website
- Each website can edit its own data
Some data will be shared across all websites
- There will be a central CMS that is allowed to edit this data, but other websites can read and use that data

e.g. say you're planning the infrastructure for a company that owns multiple sub-companies that make different kinds of products, some in the same category (cereal, food), others in completely different categories (books, instruments). Some are marketing websites, some are for CRM, some are online stores

there are a list of regulatory requirements that affect all products
each company should manage the status of compliance of its own products to each requirement
when a new requirement surfaces, details regarding that requirement should only be entered once

How would the multiple databases be coordinated?

edit: added more info per Bob's suggestions

Thanks for the incredibly insightful questions!

compliance data is not shared, silo'd within each site
shared data is only on the one enterprise-wide database, they will mostly be "types of [thing]"
no conclusive list of instances where they'll be used but currently it'd be to populate CMS dropdowns for individual sites.
changes to shared data would occur a few times a year.
Ideally changes would be reflected within a few minutes, but an hour or so should be acceptable
very low volume in shared data.
All DBs will be new, decision on which DB is pending current investigation.
Sub-systems will expose REST api

It would be very helpful to know how the two lists combine: Is the compliance data shared or not? What data is shared (apart from possibly compliance data) and what is it used for? How often does it change? How quickly and reliably do the changes need to be seen in all the places the shared data is needed? What data volume do you expect in the stream of changes? Do these databases already exist? Are they (or must they be, if they don't currently exist) the same kind of database e.g. SQL Server / MySQL etc? Are there any APIs exposed by the sub-systems? — Bob Salmon, Apr 25 '16 at 14:45
Thanks for the incredibly insightful questions! Compliance data is not shared, shared data is only on the one enterprise-wide database, they will mostly be "types of [thing]", no conclusive list of instances where they'll be used but currently it'd be to populate CMS dropdowns for individual sites. It would change a few times a year. Ideally changes would be reflected within a few minutes, but an hour or so should be acceptable. Very low volume in shared data. All DBs will be new, decision on which is pending current investigation. sub-systems will expose REST api — CheapSteaks, Apr 25 '16 at 15:18
I haven't finished with the questions ;-). Would the shared data be stored in the same structures in the various sub-systems or could it be stored differently? Do you have a house style / more in-house skills in coding or in databases? Is the set of sub-systems already bound together e.g. via a shared message bus? Are the sub-systems near each other in network terms? The problem you describe seems quite a small synchronisation one - low volume, not often, unchallenging latency requirements, consistency isn't too onerous. — Bob Salmon, Apr 25 '16 at 15:59
No requirements on the structure of how shared data will be stored in sub-systems. In-house skills in both but this would be the first time working with more than one db at a time. Sub systems will not have interaction with each other, and currently don't exist (planning phase of greenfield project), they will ideally all be on AWS — CheapSteaks, Apr 25 '16 at 17:43

score 2 · Answer 1 · answered Apr 25 '16 at 16:05

Here are some ways I have seen this handled, you need to think about the implications of each structure based on the details of your particular business domain. All can work, but all have to be carefully set up if they are going to work.

One database for shared information and one for each client for client-specific information. Set up the overall application so that the first thing you put in the application on log in is the client and it connects to the correct client. People might have to also have a way to change the client if users will handled multiples.

Separate servers for each client if they completely need to be siloed. Database changes are by script (and in source control) and are applied to each server as need be. So the changes to the central database might have a job that runs to push any data changes to the other servers

All the data in one database, but making sure each table has a client_id so that the data is always filtered correctly by client. You can set up separate views by client, so that the users can only see the clients they are supposed to see. This only works if the data for each client is substantially in the same form.

And since you are in a regulatory environment, I strongly urge that you create an audit database that is updated by database triggers (never audit from the application, you will lose changes to the data) for each database.

thank you for the advice! the regulatory example was just an example for the kind of data structure that we might have, but taking auditing into consideration is definitely a good idea — CheapSteaks, Apr 25 '16 at 18:01

score 1 · Answer 2 · answered Apr 25 '16 at 15:22

I don't think this question is sufficiently clear to get a single answer. However there are a few possibilities.

In many cases, where you have shared data you want to have a single point of ownership of that information. It could be in a database, in an excel file (which can then be turned into csv and periodically loaded on all dbs), or some other form. The specifics depend on what is shared exactly.

Now in this case it sounds like you are going to have some sort of legal department in charge of some shared information and they will manage that data, which will then be shared to the other sites. This might be done with an application they manage which aggregates information from the other companies or it could be data which is pushed to their systems.

A final point:

Software is at its best when it facilitates human solutions to human problems, not when it tries to solve those problems directly. In these cases, you probably want a good human solution in place and then to look at what software can do to support that. A lot of the issues (who owns the information?) will already have been solved and you will be simply automating what is already done.

score 1 · Accepted Answer · answered Apr 25 '16 at 16:17

1

I agree with Chris that, even after both the sets of questions, there is still a big set of possible solutions. For instance, if the databases were the same technology, and the shared data were stored in the same way in each one, you could do db-level replication from the central db to the others. Is it OK to have 2 separate dbs per application (one with shared stuff and one with not-shared?) - this would influence the kind of replication.

Or you could have a purely code solution, where clicking publish in a GUI that updates the central db calls a set of APIs that also update the other dbs. Or micro-services - updating the central db also creates a message on a shared queue, that is picked up by services that each look after a different db and apply the updates in whatever form makes sense for that db.

It depends on (among the things already mentioned) what your organisation's technology strategy is, what technology and skills you already have in-house, and so on.

So this is as much an architecture question as it is a db question.

answered Apr 25 '16 at 16:17

Bob Salmon

411
4
10

Thanks! db-level replication and 2 separate dbs sounds appealing. Is there a name for whatever the application (or db?) would have to do to normalize the data for end-users? – CheapSteaks Apr 25 '16 at 18:07
What do you mean by normalise the data for end users? – Bob Salmon Apr 25 '16 at 18:51
Sorry - SO locked my comment so I couldn't edit it. If you mean "copy", then it is often referred to as replication or synchronisation. For instance in MySQL: http://dev.mysql.com/doc/refman/5.7/en/replication.html – Bob Salmon Apr 25 '16 at 19:00
e.g. with 2 DBs, the application would need to be able to merge the data from both DBs when presenting to the CMS user. Since they're separate DBs we won't be able to perform things like JOINs right? If I'm using IDs from something in a different DB as the FK, are there things I should watch out for? Would like to do a bit more reading on this area, but not sure what I should search for – CheapSteaks Apr 25 '16 at 19:37
1

Ah! You mean something like this: http://stackoverflow.com/questions/1565993/oracle-database-link-mysql-equivalent. I think that federation is a useful word to search for, or "view across databases". – Bob Salmon Apr 25 '16 at 19:42

How to structure/coordinate multiple databases?

3 Answers3