What is the best database design when multiple tables share a common data model?

Question

It is very common coming up with the situation where you have multiple tables, let's say Posts, Entries and News and the users can comment in any of them.

The natural way to think this would look like would be something like this:

enter image description here

However, using the parent_type attribute to set up the primary key seems ambiguous to me, because it doesn't really describe clearly to which parent the comment belongs to.

It would seem to me like the clearest way to understand the relationship would be to create an intermediate table for each relationship: posts_comments, posts_entries, posts_news. But then again this doesn't have much sense in terms of database design because you don't really need an intermediate table for a one to many relationship, they are only needed for many to many relationships.

Then maybe a third approach would be to make a different Comment model depending on what it belongs to: PostComment, EntryComment. This would help in understanding better what that comment is for, but I think it is a terrible idea because a comment could just be represented by a single table and you could end up with a bunch of different tables with repeated column while it could just be represented by one table.

So, what is the usual approach for this situation?

the 3rd approach sounds fine to me. think like a DBA, not a programmer. if these are separate they can be managed (indexed, have storage allocated) separately, and i don't need an index to keep these straight. compared to that the redundancy in how the tables are defined is not a big deal. — Nathan Hughes, Jan 14 '15 at 17:15
possible duplicate of [Database design - articles, blog posts, photos, stories](http://stackoverflow.com/questions/4969133/database-design-articles-blog-posts-photos-stories) — Mike Sherrill 'Cat Recall', Jan 14 '15 at 17:18

score 2 · Answer 1 · answered Jan 14 '15 at 19:44

I agree that you don't want to create three tables that are almost identical if you can help it. I've seen plenty of databases that do this and it's a pain, because any change is likely to affect all the variations, so you have to make the change three times instead of once, and there are three times as many queries to change. And sooner or later someone will make a change and not know he should make the same change to the other two tables or be in a hurry or forget, and then six months later someone else comes along and wonders if there's a reason why the three tables are subtly different or if this is just carelessness.

But you also don't want to have a column with an ambiguous definition. You especially don't want a foreign key that can refer to different tables depending on the content of a type field. That would be really ugly.

A solution to a similar problem that I used once was to create an intermediate table. In this case, it would be -- I don't know if you have a word that encompasses news, posts, and events, so let me call them all collectively "articles". So we create an article_comments table. It may have no data except an ID. Then news, posts, and events all have a pointer to article_comments, and comments has a pointer to article_comments.

So if you want all the comments for a given news record, it's:

select whatever
from news n
join article_comments ac on ac.iarticle_comments_id=n.article_comments_id
join comments c on c.article_comments_id=ac.article_comments_id
where n.news_id=@nid

Note that with this structure, all FKs are true FKs to a single table. You don't make article_comments point to news, posts, and events; you make news, posts and events point to article_comments, so all the FKs are clean.

Yes, it's an extra table to read, which would slow down queries a bit. But I think that's a price worth paying to keep the FKs clean.

One admittedly clumsy query with this structure would be if you want to know which article a given comment is for if you don't know the type of article. That would have to be:

select whatever
from comment c
join article_comment ac on ac.article_comment_id=c.article_comment_id
left join news n on n.article_comment_id=ac.article_comment_id
left join post p on p.article_comment_id=ac.article_comment_id
left join event e on e.article_comment_id=ac.article_comment_id
where c.comment_id=@cid

and then see which of news, post, and event turns up non-null. (You could also do it with a join to a subquery that's a union, but I think that would be uglier.)

@dejell That's a decision you have to make. The performance penalty would be small, but it would not be zero, and if you have many accesses to the DB, it can add up. I'm happy to hear if someone has a solution that does not involve a performance penalty. — Jay, Mar 01 '17 at 16:03

score 0 · Answer 2 · answered Dec 01 '20 at 03:28

So later. However, you can design model as same below:

#Entity

uid: string
type: string/ (post, new,...)
...

#Comment:

uid: string
entity_id: string
body: String

Other way (only one table)

Entity:

uid: string
parent: string
type: string /post, new, comment, revison,...
metadata: jsonb

Hope useful for anyone!

What is the best database design when multiple tables share a common data model?

2 Answers2

Entity: