12

In an application similar to StackOverflow that I am building, I am trying to decide what relationship my Questions, Answers and Comments tables should have.

I could have Questions and Answers both be represented by a single table Posts.

That would allow Comments to have a single foreign key to Posts.

But if Questions and Answers are separate tables, what relationships should Comments have to each of these?

UPDATE: Although the chosen answer recommends a Class Table Inheritance approach and this seems like the best approach in database terms, this option is not supported by the Rails ORM. So, in Rails my models will have to use Single Table Inheritance and will probably look like this:

class Post < ActiveRecord::Base  
end  

class Question < Post  
  has_many :answers, :foreign_key => :parent_id  
  has_many :comments, :foreign_key => :parent_id  
end  

class Answer < Post  
  belongs_to :question, :foreign_key => :parent_id  
  has_many :comments, :foreign_key => :parent_id  
end  

class Comment < Post  
  belongs_to :question, :foreign_key => :parent_id  
  belongs_to :answer, :foreign_key => :parent_id  
end


class CreatePosts < ActiveRecord::Migration  
    def self.up  
      create_table :posts do |t|  
        t.string :type 
        t.string :author   
        t.text :content  
        t.integer :parent_id   
        t.timestamps  
      end  
    end  


    def self.down  
      drop_table :posts  
    end  
end
CREATE TABLE "posts" (
  "id" INTEGER PRIMARY KEY AUTOINCREMENT NOT NULL,  
  "type" varchar(255),  
  "author" varchar(255),  
  "content" text,  
  "parent_id" integer,  
  "created_at" datetime, 
  "updated_at" datetime
  );
Tiny
  • 27,221
  • 105
  • 339
  • 599
Charlie K
  • 261
  • 2
  • 8
  • 11
    It seems like bad form to ask the site you're trying to replace for help replacing it... – samoz Jun 12 '09 at 16:30
  • 5
    @samoz: Nah, people talk about Stack Overflow clones here all the time, I mean CNProg (http://code.google.com/p/cnprog/) is one of those. :-P – C. K. Young Jun 12 '09 at 16:32
  • My hope is that he's not actually trying to replace it, but rather needs something that's internal to his company only, or that he is simply using this as an example. – Orion Adrian Jun 12 '09 at 16:32
  • Possible duplicate to your own question? http://stackoverflow.com/questions/987063 – DanSingerman Jun 12 '09 at 16:33
  • 2
    Maybe SO should open source their design not the code. Imitation is the best form of flattery... – Ryan Oberoi Jun 12 '09 at 16:34
  • Was it in one of the podcasts that Jeff Atwood indicated that Comments and Answers were represented in the same table, separate from the Questions table? – artlung Jun 12 '09 at 16:50

6 Answers6

17

I'd go with the Posts approach. This is the best way to ensure referential integrity.

If you need additional columns for Answers and Questions respectively, put them in additional tables with a one-to-one relationship with Posts.

For example, in MySQL syntax:

CREATE TABLE Posts (
  post_id     SERIAL PRIMARY KEY,
  post_type   CHAR(1),              -- must be 'Q' or 'A'
  -- other columns common to both types of Post
  UNIQUE KEY (post_id, post_type) -- to support foreign keys
) ENGINE=InnoDB;

CREATE TABLE Comments (
  comment_id  SERIAL PRIMARY KEY, 
  post_id     BIGINT UNSIGNED NOT NULL,
  -- other columns for comments (e.g. date, who, text)
  FOREIGN KEY (post_id) REFERENCES Posts(post_id)
) ENGINE=InnoDB; 

CREATE TABLE Questions (
  post_id     BIGINT UNSIGNED PRIMARY KEY,
  post_type   CHAR(1),              -- must be 'Q'
  -- other columns specific to Questions
  FOREIGN KEY (post_id, post_type) REFERENCES Posts(post_id, post_type)
) ENGINE=InnoDB;

CREATE TABLE Answers (
  post_id     BIGINT UNSIGNED PRIMARY KEY,
  post_type   CHAR(1),              -- must be 'A'
  question_id BIGINT UNSIGNED NOT NULL,
  -- other columns specific to Answers
  FOREIGN KEY (post_id, post_type) REFERENCES Posts(post_id, post_type)
  FOREIGN KEY (question_id) REFERENCES Questions(post_id)
) ENGINE=InnoDB;

This is called Class Table Inheritance. There's a nice overview of modeling inheritance with SQL in this article: "Inheritance in relational databases."

It can be helpful to use post_type so a given Post can be only one answer or one question. You don't want both an Answer and a Question to reference one given Post. So this is the purpose of the post_type column above. You can use CHECK constraints to enforce the values in post_type, or else use a trigger if your database doesn't support CHECK constraints.

I also did a presentation that may help you. The slides are up at http://www.slideshare.net/billkarwin/sql-antipatterns-strike-back. You should read the sections on Polymorphic Associations and Entity-Attribute-Value.


If you use Single Table Inheritance, as you said you're using Ruby on Rails, then the SQL DDL would look like this:

CREATE TABLE Posts (
  post_id     SERIAL PRIMARY KEY,
  post_type   CHAR(1),              -- must be 'Q' or 'A'
  -- other columns for both types of Post
  -- Question-specific columns are NULL for Answers, and vice versa.
) ENGINE=InnoDB;

CREATE TABLE Comments (
  comment_id  SERIAL PRIMARY KEY, 
  post_id     BIGINT UNSIGNED NOT NULL,
  -- other columns for comments (e.g. date, who, text)
  FOREIGN KEY (post_id) REFERENCES Posts(post_id)
) ENGINE=InnoDB; 

You can use a foreign key constraint in this example, and I recommend that you do! :-)

Rails philosophy tends to favor putting enforcement of the data model into the application layer. But without constraints enforcing integrity at in the database, you have the risk that bugs in your application, or ad hoc queries from a query tool, can harm data integrity.

Tiny
  • 27,221
  • 105
  • 339
  • 599
Bill Karwin
  • 538,548
  • 86
  • 673
  • 828
  • When I mentioned the Posts approach, I meant a single table called Posts that has a column called "type" which could contain the value "Question" or "Answer". Is your design called "Single Table Inheritance"? – Charlie K Jun 12 '09 at 17:03
  • No, STI is something different. The design above is called Class Table Inheritance. I'll update my answer above. – Bill Karwin Jun 12 '09 at 17:31
  • Wow, this is fantastic. I'll definitely check out the slideshow. 250 slides! You're not short of ideas about database design, are you? I was reading in another answer of yours where you say "These dilemmas are all due to the fact that Polymorphic Associations depends on using data (i.e. a string value) to refer to metadata (a table name)." over here: http://stackoverflow.com/questions/922184/why-can-you-not-have-a-foreign-key-in-a-polymorphic-association/922341#922341 Doesn't post_type in your example above also use a string value ('A' or 'Q') to refer to metadata (Questions or Answers table)? – Charlie K Jun 12 '09 at 17:56
  • Yes, that's a good point. But in this case, it's being used to assist referential integrity among disjoint subtypes in the Class Table Inheritance, instead of subverting referential integrity in the Polymorphic Association. – Bill Karwin Jun 12 '09 at 18:11
  • Re. the slides, yes, it was a 3 hour tutorial I presented at the MySQL Conference. Also doing a shorter session at OSCON in July. This all based on material I'm developing for my book "SQL Antipatterns" that I hope to get published later this year. – Bill Karwin Jun 12 '09 at 18:13
  • Any chance of putting up the video of your presentation? By the way, thanks for the inheritance article - http://everything2.com/title/Inheritance%2520in%2520Relational%2520Databases - that was great. – Charlie K Jun 12 '09 at 20:41
  • Sorry, AFAIK there was no video of my presentation recorded at MySQL Conference. – Bill Karwin Jun 12 '09 at 21:27
  • Upon further research it seems there is no support for Class Table Inheritance in Ruby on Rails, which is the technology I am using. So it seems I have to choose between Single Table Inheritance and Polymorphic Associations. After reading your slides, I'm going to go out on a limb and guess that your preference would be for Single Table Inheritance over Polymorphic Associations, in the absence of Class Table Inheritance as a choice. It appears I will also have to forgo some of the key constraints that you have recommended above - something I'm not happy about but I don't seem to have a choice. – Charlie K Jun 13 '09 at 18:27
  • Why wouldn't you be able to use foreign key constraints if you're using Single Table Inheritance? I though you would be able to -- and should. I'll add an example above... – Bill Karwin Jun 13 '09 at 21:20
  • Thanks. Yes, I can certainly back up the application-level foreign keys by implementing them in the database as you suggest in your edit. What I meant, though, is that I seem to need to omit the aggregate keys you recommended - UNIQUE KEY (post_id, post_type) in Posts; FOREIGN KEY (post_id, post_type) REFERENCES Posts(post_id, post_type) in Questions; and FOREIGN KEY (post_id, post_type) REFERENCES Posts(post_id, post_type) in Answers. I don't think those are going to work. But a foreign key from Comments to Posts on post_id shouldn't interfere since the application has already declared it. – Charlie K Jun 13 '09 at 21:33
  • I edited my question to indicate what the STI version would look like as a Rails model. And I agree that the application's "foreign key" should be backed up by an actual database constraint. – Charlie K Jun 13 '09 at 21:40
  • Right; if you use STI you don't need the compound unique key that I showed. You don't necessarily need it even for Class Table Inheritance. It's just an extra way to make sure that a given row in Posts isn't referenced by both Questions and Answers at the same time. – Bill Karwin Jun 13 '09 at 22:17
  • Ok, I see, they're just an extra layer of validation. But if I implement the compound unique keys in the database independently of the Rails application, could you envision a situation where that might cause a problem for the application? – Charlie K Jun 13 '09 at 22:23
  • I suppose in theory the extra unique constraint wouldn't cause a problem, since the Rails application shouldn't insert any legitimate rows that would violate the constraint. But I haven't tried it in Rails (I seldom use Rails), so I can't say from experience. – Bill Karwin Jun 14 '09 at 03:46
  • If you delete a Question or an Answer, wouldn't the Post and Comments still stick around? Would you have to have a trigger to handle deletes? – Joe Van Dyk Feb 02 '12 at 00:40
  • @JoeVanDyk: Good point, you'd probably want to delete the Post instead, and let cascading foreign keys remove the rows from referencing tables. – Bill Karwin Feb 02 '12 at 03:53
  • @BillKarwin What happens when Questions and Answers can have Votes? You'd create a Votables table that did the same thing the Posts table, right? If you want to delete a Question or an Answer, do you have to manually delete both the associated Post and Votable? – Joe Van Dyk Feb 02 '12 at 07:20
  • It seems to me that using an exclusive arc (so you'd have posts.question_id and posts.answer_id and a check constraint that ensures only one can be not null) is the best option here -- the only complexity is the check constraint. – Joe Van Dyk Feb 02 '12 at 07:32
  • @JoeVanDyk: For votes, I'd just make a Votes child table of Posts, it basically a many-to-many relationship to Users, because each user can vote on a given Post only once. Regarding your exclusive arc idea, that works fine (except for MySQL which doesn't support check constraints) until you need to add a third subtype of Posts besides Questions and Answers. That would require adding a new column, and as you add columns the expression in the check constraint gets *very* complex. – Bill Karwin Feb 02 '12 at 14:22
  • I don't think the check constraint needs to be that complex: check ((table1_id is not null)::integer + (table2_id is not null)::integer + (table3_id is not null)::integer = 1) – Joe Van Dyk Feb 02 '12 at 23:20
  • @BillKarwin I really like this explanation and example. Would you use this approach if your system only had the concept of topics and post? – Jaigus Nov 27 '12 at 07:41
  • @Jaigus, The relationship between topics and posts (if it's like "tags" on StackOverflow) is many-to-many. So you need a topics table and a many-to-many intersection table. – Bill Karwin Nov 27 '12 at 12:39
  • @BillKarwin Oh no I mean in the traditional forum style(like say, vbulliten), where each post belongs to one topic. Would it be better to place them into one table like stackoverflow has done with the questions and answers; or just leave them as separate tables? – Jaigus Nov 27 '12 at 22:07
  • @Jaigus, take a look at Slashdot code. Everything's a post. Each post may be a reply, so it references its "parent." But all posts also reference the top-level post of the thread so that effectively identifies topics. – Bill Karwin Nov 27 '12 at 22:28
4

In the social networks that I build I do something a bit different. If you think about it a comment could be attached to just about any Entity in a site. This could be a blog post, a forum thread or post, an article, someones picture, a persons profile, a vendor of a service, etc. For this reason I create a SystemObjects table which holds the object type (table reference). For the most part I create records for the Entities of my system that will accept comments...but these map directly to my tables. The SystemObjects table houses the SystemObjectID, and a friendly name for future reference (a lookup table).

With this in place I then create a Comments table which has the SystemObjectID reference to tell me what table to go look in. Then I also house the SystemObjectRecordID which tells me which PK of the referenced table I am interested in (along with all the standard comment data).

I use this notion of the SystemObject table for many other generic far reaching concepts in my sites. Think about Tags, Ratings, Comments, and any other dangling fruit that might be attached across your site and aggregated up for quick use.

Read more about this in my book ASP.NET 3.5 Social Networking.

Andrew Siemer
  • 10,166
  • 3
  • 41
  • 61
2

You could create a single comments table with two foreign keys, one to questions.questionID and the other to answers.answerId

Nathan Koop
  • 24,803
  • 25
  • 90
  • 125
  • um... it's not really complicated, you just create the FK's, and when you create a comment on a answer you just populate the answer's id into the correct column. – Nathan Koop Jun 12 '09 at 16:40
  • 2
    Yep, this is called an "exclusive arc," where exactly one of the foreign keys can be non-NULL. – Bill Karwin Jun 12 '09 at 16:43
  • @Bill, thanks, I was unaware of this term. I looked here and noticed http://stackoverflow.com/questions/621884/database-development-mistakes-made-by-appdevelopers where it is listed as a bad practice. Would you agree? – Nathan Koop Jun 12 '09 at 16:57
  • 1
    I don't think exclusive arcs are bad practice. That linked article says exclusive arcs are hard to maintain since you have to have to ensure that only one key is not null. You can do that with (in postgres): check ((table1_id is not null)::integer + (table2_id is not null)::integer + (table3_id is not null)::integer = 1) – Joe Van Dyk Feb 02 '12 at 07:50
1

You would need two domain tables that bring the relationships together CommentsForQuestions and CommentsForAnswers. Basically you're going to need to create 5 tables for this purpose:

Questions
Answers
Comments
CommentsForQuestions (relates comments to questions)
CommentsForAnswers (relates comments to answers)

The one problem this has is that it's inferior to the posts idea because referential integrity isn't as strong. I can garuntee that CommentsForQuestions connects to a comment and a question, but I can't prevent both a question and an answer from connecting to the same comment.

Orion Adrian
  • 19,053
  • 13
  • 51
  • 67
  • If I need to search across all comments, though, doesn't this complicate the search query? – Charlie K Jun 12 '09 at 16:33
  • No, these are domain tables. You're creating a many-to-many relationship between comments and questions; and comments and answers. – Orion Adrian Jun 12 '09 at 16:34
  • CommentsForQuestions and CommentsForAnswers are just link tables - there is only one comments table – DJ. Jun 12 '09 at 16:38
  • Any issues with RI would be minor. Yeah, it might be a little odd if that link exists, but does it bring down the application or corrupt your data? Not really.... –  Jun 12 '09 at 16:46
  • Is this called a Polymorphic Association? – Charlie K Jun 12 '09 at 16:58
1

A foreign key relationship; you can either have QuestionComments and AnswerComments, or you can have Comments have a Foreign key column for both Questions and Answers (and have those columns be exclusive).

Personally, I'd go with the Posts approach.

Edit: On consideration, there's a third approach that might work; you might have a Comments table, and then just have an association table that associates the Comments with either a Question or an Answer (so Comments would have an ID and the comment, the join table would have a CommentID, an AnswerID, and a QuestionID). Or, you could have just a Comments table, then have an Answer-Comment association table, and a separate Question-Comment association table.

Paul Sonier
  • 38,903
  • 3
  • 77
  • 117
1

There are 2 ways that I can think of.

First, use another column in the Comment table to indicate whether the comment belongs to Question or Answer. So the PK of the Comment table becomes wher PostID is the foreign key to Question or Answer and PostType can be someting like 1=Question and 2=Answer.

Second, use a relationship table for each Question and Answer. So you have a Question, Answer, Comment, QuestionComment, and AnswerComment table.

Let's say the primary key of Question, Answer, Comment tables are QuestionID, AnswerID, and CommentID respectively. Then the columns of QuestionComment would be [QuestionID, CommentID]. Similarly, the columns of AnswerComment would be [AnswerID, CommentID].

David
  • 5,356
  • 2
  • 26
  • 39
  • 1
    Your first solution above is called "polymorphic associations" but I don't think it's a good design. You can't create a real foreign key constraint, because PostID can reference either of two parent tables, depending on the value of PostType in the respective row. Foreign keys don't work this way, and if your design requires you to say "omit the referential integrity constraint" then that's a strong code smell. – Bill Karwin Jun 12 '09 at 16:45