0

I have a table called ResultIssue, and I want each of the rows of this table to link to exactly one row from one of many other tables, lets call them ResultSource1, ResultSource2 etc. ResultSources all contain different data so they can't be cleanly merged. They all generate the same type of Issue though, which is what the ResultIssue table is for.

How can I build my database to achieve this in a sensible way?

I realise I could just create a ResultIssue table for each source and Union all related rows together to generate my issue list, but I'm wondering if there is a more elegant solution. This would be similar to having a column for each ResultSource in ResultIssue, and then creating a rule to ensure only one of them is >-1, and then querying this also becomes a nightmare.

I have also been considering a link table for each ResultSource, but I'm not sure how I could force the one result requirement.

I have also considered using an enum, but this feels like it violates some architectural principle, and I'm not sure how I'd query it. (i.e. have columns called LinkedTable and LinkedTableID)

I'm sure I'm not the only one to have this specific issue, so I'm hoping someone can point me at the solution.

Apologies in advance if this question has been asked already, I have been searching for a while, and I don't have the language to express this concisely.

Edit:

Adding a column to each of the ResultSource tables won't solve this either, as each source row can generate multiple issues. I also wouldn't be able to verify that multiple ResultSource tables don't link to the same ResultIssue

Andy Wynn
  • 1,171
  • 7
  • 11

1 Answers1

1

Your Sources are in an exclusive subtype relationship. Every Source is of exactly one subtype, where the subtypes are structurally different.

This broadly falls under patterns for "relational inheritance", which is the term you will want to google if you want more info. Also see:

How can you represent inheritance in a database?

Subclassing in relational database

How to do Inheritance Modeling in Relational Databases?

As you will see from those links, there are several patterns for transforming objects with subtypes into relational facts*. Choosing between them depends on your exact requirements, but here is one that is both extensible and well structured, and which I would typically consider my "default" choice (identity columns and clustering choices omitted since those are contingent).

create table Sources
(
   sourceId int not null unique,
   sourceType int not null,
   constraint pk_soruces primary key (sourceId, sourceType),
   constraint ck_sources_type check (sourceType in (1, 2, 3))
);

create table Type1Sources
(
   sourceId int not null unique,
   sourceType int not null constraint df_type1sources_type default (1),      
   constraint pk_type1sources primary key (sourceId, sourceType),
   constraint fk_type1sources_sources foreign key (sourceId, sourceType) references Sources,
   constraint ck_type1sources_areType1 check (sourceType = 1)
);

-- create table Type2Sources (...);
-- create tabel Type3Sources (...);

Notice that in Sources the sourceId is unique by itself, but in order to ensure two different subtypes can't refer to the same sourceId, we also add a sourceType attribute.

We then constrain each subtype table, such as Type1Sources, to exactly one sourceType attribute value with a check constraint, and create the foreign key to Sources on the combination {sourceId, sourceType}.

In my example I have a check constraint on Sources.sourceType that allows up to three subtype tables to be created. You can of course add more if needed.

Now that your Sources are represented, all we need to do is create an Issues table, and ensure that every Issue is associated with exactly one Source:

create table Issues
(
   issueId int not null,
   sourceId int not null,
   sourceType int not null,
   constraint pk_Issues primary key (issueId),
   constraint fk_Issues_Sources foreign key (sourceId, sourceType) references Sources
);

Here I have included the sourceType column on the Issues table, but this is not logically necessary: you could always just join from Issues to Sources on sourceId alone, and get the sourceType that way. Recall that sourceId is unique on Sources, so we can create a foreign key from Issues to Sources on just the sourceId column if we want.

The reason why I have included sourceType on Issues in this example is that it lets me quickly (ie, without a join) determine the type of source that created the issue.

Whether you choose to do that or not really just depends on whether you want to store the common attributes for all of the different SourceTypes in the Sources table, or whether you want to replicate those common attributes to each of the subtype tables and have the Sources table acting merely as a constraint structure to ensure data integrity. In the former case you will always be joining Issues to Sources if you need those attributes, so there's less incentive to put the sourceType on the Issues table.


* Rows in a database don't represent "entities", they represent "true propositions about your universe of discourse", aka "facts you care about". Hence the "impedance mismatch" between object models and relational models.

allmhuran
  • 4,154
  • 1
  • 8
  • 27