5

To give an idea of what I'm talking about, consider an entity (in my case, it's a Task) which could be linked to any number of other entities in the system. For our purposes let's say the task could be linked to:

  • Project
  • Account
  • Ticket
  • Person
  • etc

All of these are represented with their own tables in the database. Now, a task could potentially be linked to any one of those, and due to the system being in active development, the list of potential links will continue to grow relatively quickly. Note these are 1 to many relationships - a task can only be linked to one of these at a time, but a single Account could have multiple tasks tied to it.

Now, I have considered a few options for this, however I do not consider myself any kind of expert in database design, so I figured I'd reach out. Options I've considered thus-far include:

  • A foreign key for each link in the Task table, and we just have to keep adding columns. However since a task cannot be linked to more than one of them at a time, this will result in a lot of FK columns with NULL values. This also will require a new column and regeneration of our database model in our application whenever we add a new link.

  • A single column in Task that acts as a foreign key, but include another column specifying a link type, so when querying against it we can determine which JOINs happen based on type. So both Account's and Person's IDs would be in this column for their tasks, but the link type column would specify whether the ID is a person or an account. This feels very risky to me and obviously the constraints can't be enforced by the database..

  • Other options??

I would love if someone was able to point me in the direction of a "cleaner" design, but if not, would the multiple columns acting as FK constraints, but allowing NULL be the best bet?

Thanks in advance!

marc_s
  • 732,580
  • 175
  • 1,330
  • 1,459
CTDev
  • 226
  • 2
  • 12
  • I faced a similar challenge several year ago. Without much thinking, I opted for your second option. Instead of using object_id in sys.object, I created a new table that contains which constant mean which table. I've heard somewhere that this design is bad, but I have never heard why. Five years later, this type of links was used thoroughly almost everywhere in the application (Comments, Documents, Parameter, etc.). In a way, it became the core of the application. At the time, I didn't expect that, but it has made me proud to see how is has evolved. – AXMIM Oct 18 '16 at 20:17
  • Following my previous comment, since I don't work there anymore, so I can't tell if they are now hurting a downside effect with it. It has holded pretty solidly for 5 years. However, I must admit I've seen SQL server do some weird query plan one or twice because of it and there is some small little drawbacks here and there. Also, entity-framework wasn't around. I'm not sure I would trust EF to handle these joins. – AXMIM Oct 18 '16 at 20:22
  • 2
    This is a new instance of the recurring *polymorphic associations* question. Once you know the official term it's easy to find ways how to do it. – Gert Arnold Oct 18 '16 at 20:31
  • Interesting to hear another person facing a similar issue. We actually HAVE implemented the second option in another area of the application, and it seems to work well, but at the same time we are still in the early stages of development and havent been able to stress test much yet (hell, it's hard for me to get them to let me even set up unit tests, let alone run stress testing...sigh, PMs.). It's funny you mention the execution plans as that was something I was curious about, I assumed that with constraints in place that the execution plan would be much more efficient and scale better.. hmm – CTDev Oct 18 '16 at 21:31
  • Gert - thanks for that info, googling Polymorphic associations right now :) – CTDev Oct 18 '16 at 21:31
  • Just wanted to say, after googling polymorphic association, I found some genius solutions! Thanks Gert Arnold! For those who look for this later, I'll reference this post (http://stackoverflow.com/questions/2002985/mysql-conditional-foreign-key-constraints), it's pretty straightforward and provides a clean solution maintaining referential integrity. Only catch is if you're in a DB that hasn't implemented GUIDs you're SOL taking that approach (at least the way I'm understanding it) – CTDev Oct 18 '16 at 22:11
  • Take a look at http://stackoverflow.com/search?tab=votes&q=user%3a196713%20subtype – Damir Sudarevic Oct 19 '16 at 17:56

3 Answers3

3

I would use first option.

Cons:

  1. Add new column when you add new table - As you are already editing database by adding new table, adding one column should not be problem.
  2. NULL values in many columns - It does not have big impact on performance or anything else. You can use default values instead of NULL if it fits you better. See this question (SQL Server - Performance/Size Drawbacks of Null Columns) and answers

But on the flip side, you get more robust relations, understandable joins, much more appropriate entity framework mappings, easier queries ant etc.

Community
  • 1
  • 1
Adil Mammadov
  • 8,476
  • 4
  • 31
  • 59
  • Agreed on this - I believe this is the way I'm going to go. It can be a bit of a pain writing against it in our application, but as you point out the benefits far outweigh the cons. – CTDev Oct 18 '16 at 21:28
  • @CTDev, I am happy that I formed some opinion, but be sure to investigate well before implementing it. Try searching *polymorphic associations* as GiladGreen pointed out. – Adil Mammadov Oct 18 '16 at 21:38
  • 1
    I read up on it and definitely found a much better solution. Implementing a table similar to an interface in OOP (I called it "Taskable"), I was able to build a relationship where, on each insert in to say Account, I also insert in to Taskable, the primary key from Account that was just inserted. Task then references IDs in Taskable, and I implement FKs from the PKs in Account, Project, etc. By doing that, so long as PKs in my tables are GUIDs, I have a direct path with referential integrity preserved. Learned a lot today! – CTDev Oct 18 '16 at 22:14
  • This is the example that led me to that solution: http://stackoverflow.com/questions/2002985/mysql-conditional-foreign-key-constraints – CTDev Oct 18 '16 at 22:14
  • @CTDev, thanks for information, I will also take a look at that solution – Adil Mammadov Oct 18 '16 at 22:18
1

I have found in the past that with proper consideration of design that this is not necessary. For example an account can have many projects. An account can have many persons. A project can have many tasks. So tasks only relates to projects.

If that really does not work then you can consider a tasks table for each type. Project tasks, account tasks, etc. This will improve query performance.

You would then want a domain rule to ensure that all of your task tables adhere to a specific schema.

I learned about the domain rule in college but never implemented it in the real world so I don't know how it could be done in SQL server. In real world scenarios it has always worked out as I specified in the first paragraph.

Hope this helps. Otherwise the other two answers here make sense.

Joe C
  • 3,925
  • 2
  • 11
  • 31
  • Thanks for the insight - it's a weird position where a task can be tied to a project (which, in turn is tied to an account), and therefore it IS tied to the account as well (indirectly). However, they also want the task to be able to be tied to the Account directly. Im not 100% sure how to go about it as the Task entity is the same no matter what its tied to so I was hoping to use the single Task table, but at the same time the referential integrity becomes a problem. That said I'll take a look at the domain rule, thanks for the info! – CTDev Oct 18 '16 at 21:25
  • I think the key here is 'they also want the task to be able to be tied to the Account directly.' I have found that examining the business rules with the stakeholders almost always allows for a normalized design. Those can usually be handled through UI and Reporting and most frameworks plug into normalized databases so it pays in the long run. – Joe C Oct 18 '16 at 23:53
0

Actually, an accepted standard is a REF or XREF table. So, for example between Task and Project, you'd have a table that has an ID for the table, a Foreign Key for a Task, and a Foreign Key for a Project.

Basically, you're associating the project and task by ID, and just will add a new entry every time you need a new association. If there's information specifically about that relationship, it will live in this table with the relationship.

Adam Wells
  • 545
  • 1
  • 5
  • 15