1

I'm a big fan of data integrity when it comes to database. My way of thinking is :

  • It's better to enforce the constraint directly in the database (example : foreign key) than either waiting for the application to crash because the row didn't exist in the table or manually adding this constraint everywhere in the code of the application.

The problem

We have a few tables that are duplicated in multiple databases and used by multiple applications. One example is the User table.

We have users in nearly all applications. The issue is that those users are treated separately at the moment while they should, at least I think, be all group up together. Currently, there is many duplicates all over the place and outdated information about user everywhere. When a user is updated in one database/application it is not updated in the other one, but it's the same user so his information should be updated everywhere.

What we are planning to do

We were thinking about creating a reference database to regroup all this information. For example, all the information about users would be store in the same database and every application could use this database to access the information they need about their user.

Question 1 : Is this a good idea? Is there other alternative to avoid duplicate/outdated data all over the place?

The new problem

While grouping all the user information into a single database could fix the problem of duplicate/outdated user information, this create a new problem about data integrity :

  • We can no longer create foreign key constraint on the User table since it's located in another database.

Sure, the user tables are easy to access with views but you can't apply foreign key constraint on view either...

Question 2 : So, what are my options if I would like to keep the data integrity constraints directly into the database?

Gudradain
  • 4,653
  • 2
  • 31
  • 40
  • 1
    Your question is too broad. It is really depends of what you value more, data integrity or data centralization. We know nothing about your business. Are you heavy on audit, or you don't care about changes in your DB... If data integrity is important, keep separate users and create synchronization jobs to keep them fresh in each database. – T.S. Jun 20 '14 at 19:41
  • Out of scope for the question, but it sounds more like you should really have one database, where the different applications have, in some cases, their own schema (like dbo), and do more to coordinate development efforts among your application teams. – Joel Coehoorn Jun 20 '14 at 19:43
  • @T.S. I'm aware that the question is broad but I don't want to ask a question too specific and missing a much better solution to the real problem that is data integrity and centralization. The most important point is data centralization. – Gudradain Jun 20 '14 at 19:53
  • You could have a central db for the User and the scalar properties. CentralDB.dbo.CentralUser.CentralUserSurrogateKey. You can add the scalar data like LastName, FirstName here. Then in each db, you have a MyApplicationDB.dbo.User.UserSurrogateKey (PK) and MyApplicationDB.dbo.User.CentralUserSurrogateKey (a "loose" foreign'ish key). These 2 columns are the only thing in this table. You FK your MyApplicationDB with this UserSurrogateKey. You can write a trigger on each table to make sure the "User.CentralUserSurrogateKey" exists in CentralUser.CentralUserSurrogateKey. Not great I know. – granadaCoder Jun 20 '14 at 20:04
  • I actually did this one time with the Asp.Net Membership provider (aspnet_Users table). We kept "Users" in this table, and used the Asp.Net membership provider to create/update and deal with users. Then we wrote a "aspnet_Users to my app" plugin, that would do what I described about. Put that aspnet_Users.UserId ( a copy ) in the "MyAppDB". It worked Ok. But with the "FK cannot cross a single db boundaries", you're limited on what you can do. – granadaCoder Jun 20 '14 at 20:18

2 Answers2

1

Question 1 : Is this a good idea? Is there other alternative to avoid duplicate/outdated data all over the place?

The problem your facing isn't unique. applications have been designed for ages to operate within their own scope, own their own data and be "self contained." As industry has realized the cost of maintaining separate systems, and the value of data quality, they've striven to improve quality and reduce overhead. What you're getting into is the reason to have N-Tier Development and common company specific code. For example a service or "DLL" that obfuscates the data layer allowing the developer to be unaware of the database while gaing control over common information. Is this a good idea; if you're company is growing, you really can't afford not to.

Alternative is to identify a single authoritative source and replicate information from that to all other sources; or require sub systems to report back to the authoritative when they change information, managing conflicts if data is changed in both places.

Question 2 : So, what are my options if I would like to keep the data integrity constraints directly into the database? Identification of a authoritative source and Replication between the applications. Ensure updates made in sub systems propagate to master and master propagates to children

Still require the unique identifier exist in the various systems allowing you to manage the integrity; but have all the "associated values" that are costly to maintain back in the authoritative source. Consider requiring an association between user and application in the user database which would add a layer of security to the systems. and note when users are in, or no longer in the system (Not just an existence of a record but the start and end dates. no end date user still has access to the application.

xQbert
  • 34,733
  • 2
  • 41
  • 62
1

I would look into the transactional replication feature if you are using sql-server.

It can be configured to be bi-directional or one way.

One-way - Changes to your master user repository will be pushed out in transactional changes to subscribers, however the subscriber can also poll and pull from the publisher. It is important to note that in this setup your local copy of the user table can not be written to or it will cause replication to fail warranting a re-initialization of the subscription.

Bi-directional - This is by nature more chatty however, both the publisher and subscriber can be updated.

The whole process has optional higher level of orchestration using a dedicated sql instance used as a distributor set in place to monitor multiple publications and subscriptions.

Why reinvent the wheel. If you are using mssql enterprise then this is standard practice.

Ross Bush
  • 14,648
  • 2
  • 32
  • 55
  • 1
    Would you say : "Put as much data as possible in global database" or "Put as few data as possible in global database"? For example, if I create a minimalist user table in the global database with email, first name and last name, it should be enough to uniquely identify a specific user. Then I could keep the data specific to particular application in the application database. But, I still have some risk of duplicate data :S – Gudradain Jun 20 '14 at 20:11
  • Yes, I bet all your applications use the same subset of data, such as those you listed. Isolating that lot of data along with the user's primary key into a shared "User" table would satisfy your foreign key constraints and give you near (real-time) updates to user names, passwords, email. etc. etc. Any data that is specific to a particular application should be keyed to the user but stored in another table, in my opinion. This is from a pure database view. You might have other constraints in your applications that invalidate this. – Ross Bush Jun 20 '14 at 20:19