32

I'm designing a database for a web site that will have at least 4 different object types represented (articles, blog posts, photos, stories), each of which have different enough data requirements to warrant their own tables. We want users to be able to post comments for any of these types. The data requirements for comments are simple and independent of the type of thing the comment regards (ie just a comment body, and the author's email).

I want to avoid the redundancy of creating and managing 4+ separate tables for the comments, so I'd like to be able to hold all comments in one table, possibly specifying the relation via 2 columns: one to designate the parent entity and one for the parent row Id.

but I don't understand how, then, I would implement foreign keys, since foreign keys establish a relation between 2 and only 2 tables (right?).

So with all that in mind, what would be the best approach?

Rui Jarimba
  • 11,166
  • 11
  • 56
  • 86
Faust
  • 15,130
  • 9
  • 54
  • 111

3 Answers3

45

Here's one way to implement supertype/subtype tables for your app.

First, the supertype table. It contains all the columns common to all subtypes.

CREATE TABLE publications (
  pub_id INTEGER NOT NULL PRIMARY KEY,
  pub_type CHAR(1) CHECK (pub_type IN ('A', 'B', 'P', 'S')),
  pub_url VARCHAR(64) NOT NULL UNIQUE,
  CONSTRAINT publications_superkey UNIQUE (pub_id, pub_type)
);

Next, a couple of subtype tables.

CREATE TABLE articles (
  pub_id INTEGER NOT NULL,
  pub_type CHAR(1) DEFAULT 'A' CHECK (pub_type = 'A'),
  placeholder CHAR(1) NOT NULL, -- placeholder for other attributes of articles
  PRIMARY KEY (pub_id, pub_type),
  FOREIGN KEY (pub_id, pub_type) REFERENCES publications (pub_id, pub_type)
);

CREATE TABLE stories (
  pub_id INTEGER NOT NULL,
  pub_type CHAR(1) DEFAULT 'S' CHECK (pub_type = 'S'),
  placeholder CHAR(1) NOT NULL, -- placeholder for other attributes of stories
  PRIMARY KEY (pub_id, pub_type),
  FOREIGN KEY (pub_id, pub_type) REFERENCES publications (pub_id, pub_type)
);

The CHECK() and FOREIGN KEY constraints in these subtype tables prevent rows from referencing the wrong kind of row in the supertype. It effectively partitions the pub_id values among the subtypes, guaranteeing that any given pub_id can appear in one and only one of the subtype tables. That's why you need either a PRIMARY KEY or NOT NULL UNIQUE constraint on the pair of columns {publications.pub_id, publications.pub_type}.

The table for comments is simple. Given that it is to have the same structure for all subtypes, you can reference the supertype.

CREATE TABLE comments (
  pub_id INTEGER NOT NULL REFERENCES publications (pub_id),
  comment_timestamp TIMESTAMP NOT NULL DEFAULT now(),
  commenter_email VARCHAR(10) NOT NULL, -- Only allow people who have 
                                        -- really short email addresses
  comment_text VARCHAR(30) NOT NULL,    -- Keep 'em short!
  PRIMARY KEY (pub_id, comment_timestamp, commenter_email)
);

Add a little bit of data.

INSERT INTO publications VALUES
(1,'A', 'url 1 goes here'),
(2,'A', 'url 2 goes here'),
(3,'S', 'url 3 goes here');

INSERT INTO articles VALUES
(1,'A', 'A'),
(2,'A', 'B');

INSERT INTO stories VALUES
(3,'S', 'A');

INSERT INTO comments VALUES
(1, now(), 'a@b.com','You''re stupid'),
(1, now(), 'b@c.com', 'You''re stupid, too!');

Now you can create a view to show all articles and resolve the join. You'd do the same for each of the subtypes.

CREATE VIEW articles_all AS
SELECT P.*, A.placeholder
FROM publications P
INNER JOIN articles A ON (A.pub_id = P.pub_id)

You might prefer names like "published_articles" instead of "articles_all".

To select one article and all its comments, you can just left join the two tables. (But see below why you probably won't do that.)

SELECT A.*, C.*
FROM articles_all A
LEFT JOIN comments C ON (A.pub_id = C.pub_id)
WHERE A.pub_id = 1;

You'd probably not actually do that for a web interface, because the dbms would have to return 'n' copies of the article, where 'n' equals the number of comments. But it does make sense to do this in some applications. In applications where it makes sense, you'd use one updatable view for each subtype, and application code would use the updatable views most of the time.


The more common business application of a supertype/subtype involves "Parties" (the supertype), "Organizations" and "Individuals" (the subtypes, informally companies and people. Addresses, like "comments" in the example above, are related to the supertype, because all the subtypes (organizations and individuals) have addresses.

Mike Sherrill 'Cat Recall'
  • 91,602
  • 17
  • 122
  • 185
  • The key thing (pardon the pun) about your solution is the fact that the PK of the subtype tables is also an FK to the supertype table. – Walter Mitty Feb 12 '11 at 10:50
  • Rather than using values like 'S' in the pub_type column, would it not make sense to use the whole relevant table-name? It's not like the table-names are going to change often. I would think that would help with system readability, as well as dynamic generation of SQL for the join. Or is there any reason that would be a bad idea? – Faust Mar 09 '11 at 09:55
  • 7
    @Snake: It's not a bad idea, but it's not necessary for joins, and you don't need dynamic SQL. (Dynamic SQL has risks that are better avoided if possible.) A supertype/subtype design for 'n' subtypes maps to 'n'+1 base tables and 'n' views. Each of the 'n' views joins the supertype table to one of the subtype tables; then clients use the views, not the base tables. (Write triggers, if necessary, to let you do INSERT, UPDATE, DELETE through the views.) Once you have a well-named, updatable view, you don't need to read that column yourself. It's just to help SQL maintain data integrity. – Mike Sherrill 'Cat Recall' Mar 09 '11 at 11:26
  • 3
    @Catcall: Thanks. Gotcha clear on the use of views. I kinda knew that but didn't think thru the question well enough. Another question: My plan would be to rely on the Publications table to generate the Pub_Id -- then this gets copied to the sub-type table, therefore the value will be unique within each table, so I wouldn't need to have a compound primary key on Publications table. Does that make sense? – Faust Mar 09 '11 at 15:57
  • 7
    @Snake: I understand what you're saying, but you really need that column and the compound key if you're using a SQL dbms. That CHAR(1) column, implemented in the supertype and every subtype and used as part of both a compound key and a foreign key reference, guarantees that each row in the supertype table can join to one and only one row in one and only one subtype table. Without it, you could insert the same pub_id in every subtype table, which makes the supertype/subtype design useless. – Mike Sherrill 'Cat Recall' Mar 09 '11 at 16:15
  • @Catcall: Ah, ok. -- I knew duplicating pubtype in the sub-type table served that purpose somehow, but it's becomming clearer, now, how that works. – Faust Mar 10 '11 at 09:02
  • 3
    @CatCall: So back to the last question: given the role pubtype has in the primary key, is there a performance reason, or other, to keep it to 1 character? And if so, would an int be even better? (BTW: thanks for your time and patience. And pls excuse my "density" on these things. My experience is primarily front-end, but I'm a one-man project this time; though, fortunately, I've lobbied successfuly for lots of time so I can think things thru carefully.) – Faust Mar 10 '11 at 09:11
  • 6
    @Snake: CHAR(1) is long enough to be readable enough, and it takes less space than an integer. You might get marginally better performance with CHAR(1), or you might get marginally better performance with an integer. But I generally prefer text to numbers, because I find it easier to read text. (A)rticle, (B)log post, (S)tory is easier to remember than (1) article, (2) blog post, (3) story. I don't mind spending time; you shouldn't mind upvoting every one of my replies. And my original answer, too. ;) – Mike Sherrill 'Cat Recall' Mar 10 '11 at 11:10
  • @CatCall: I just now got the privilege (15 pts) to vote things up. – Faust Mar 10 '11 at 22:23
  • 1
    @CatCall: in your sample code, you have a composite, primary key for the subtypes with pub_id & pub_type, but the super-type has only pub_id for the primary key -- is that what you meant, or should the super-type also have a composite, primary key? thx. – Faust Mar 25 '11 at 14:59
  • 2
    @Snake: In the absence of any other constraints, a composite key in the supertype would allow the same ID number to be associated with two or more subtype tables. But the characteristic feature of supertype/subtype design is that each row in the supertype must associate with *only* one row in *only* one subtype table. So, in the absence of other constraints, a composite key like that would break the design. (But note that although the primary key is a single column, the alternate key is composite. SQL requires that for the FK constraints to work.) – Mike Sherrill 'Cat Recall' Mar 25 '11 at 16:15
  • @CatCall: So if I'm joining another, NON-sub-type table (tbl_x) to Publications, I beleive I should: 1) have both a Pub_id and a pub_type column in tbl_x as a FK referencing same cols in publications WHEN I want to ensure it only references one type of publication or 2) have just pub_id column in tbl_3 as a FK referencing only pub_id in publications WHEN tbl_x can reference any type -- because I can grab the type from the join if I need to know what it is. ... am I generally correct here? Or should I *always* need the 2 columns to join to publicatoins from a non-publications table? – Faust Mar 28 '11 at 14:26
  • 1
    @Snake: Close. First, ask "Does this relate to the supertype or to a subtype?" In my example, comments are the same regardless of the subtype, so `comments (pub_id) references publications (pub_id)`. If your new table relates only to articles, use `new-table (pub_id, pub_type) references articles (pub_id, pub_type)`. As the example above is written, you can't reference only `articles (pub_id)`, because the dbms doesn't know that `articles (pub_id)` is unique. But pub_id *is* unique; you can declare an additional UNIQUE constraint on `articles (pub_id)` and then reference `articles (pub_id)`. – Mike Sherrill 'Cat Recall' Mar 28 '11 at 14:39
  • 2
    I know this is an old thread, but FWIW - I have never in 15 of relational database programming come across a situation where using subtypes/supertypes in the database was beneficial in any meaningful way. In fact, it's often detrimental. In my experience, it leads to extremely confusing application code that is costly to maintain and extend. – Jonathan Stark Aug 19 '14 at 19:41
  • Don't get me wrong - I see the attraction... I love love love the idea of DRY code, even in the db layer. We all want to be lazy and inherit wherever we can. I get it. I've even gone so far as to create a table called "record" that contains fields common to all records (i.e., id, create_at, updated_at) and subtyped everything else from there. But in practice, your Very Clever Schema™ ends up costing you dearly every time a new application comes along. Even Codd would admit that denormalization is often the most practical approach. – Jonathan Stark Aug 19 '14 at 19:46
  • Don't overthink it - we devs are overly prone to that sort of thing: http://stackoverflow.com/questions/14415881/how-to-pair-socks-from-a-pile-efficiently – Jonathan Stark Aug 19 '14 at 19:49
7

You can use super-type/subtype in DB design to avoid that problem. Create a super-type for images, video, notes and then link to the super-type. Keep all common column in the super-type table.

Here are few links to several similar questions/answers with models:

Community
  • 1
  • 1
Damir Sudarevic
  • 21,891
  • 3
  • 47
  • 71
3

In my opinion you are better off having 4+ separate tables for the comments. Or you could have join tables. One table for all the comments... ex: blog table, comment table, blog_comment table. this would allow you to have your foreign keys.

Blog
--------
Blog_id
{other fields}

Blog_Comment
--------------
Blog_id
Comment_id


Comment
------------
Comment_id
{other fields}
John Sobolewski
  • 4,512
  • 1
  • 20
  • 26
  • 1
    This is the design I would recommend. The last possibility (and whether you need something like this would depend upon your design) would be to add a UserID to each entry in either the comment table or the blog/Video/Etx tables, such the one could retrieve all comments for a specific user (again, if this fits your design requirments, and with some possible adjustments for specific design requirements . . . ). Where the USerID ended up would probably require some thought. – XIVSolutions Feb 12 '11 at 16:56