-1

Working on an idea for a Question/Answer/Blog site. For the content of each of these, I can store them all in one table with some columns applying or not applying to each of these different types with a type column to differentiate each - OR, I can separate them into two tables Question/Blog, and Answer (or other combo), or into 3 tables one for each type.

In the one table idea, the columns would look like: id / heading / detail / type / qid

  • type column would differentiate each by being 'question' / 'blog' / 'answer'

  • qid would reference the id of the question for Answer

QUESTION: uses heading, detail, type

BLOG: uses heading, detail, type (qid matches question, if assigned as an answer, but not typical)

ANSWER: uses detail, type, qid (qid matches question id, does not use heading column)

There may be another column or two (not shown) that may apply to one type and not the other.

I think storing all in one table may make queries simpler where there is a relationship between them, but the table gets much larger quicker... What is a good approach to a database/table design like this with the expectation this community can grow quite large over time (10K to 100K active users)?

Some typical relationships:

A would be related to Q as being the answer(s) to Q. There can be multiple answers to Q. Q, A, B will all be listed on the same window with checkbox choices to show/hide Q&A or B or BOTH. Answers to Q can be associated with A or B (users can assign a blog as an answer but expect to be less often) Quantity of A will far outweigh all of them, with Q following and B the least.

I am leaning toward one table for Q/B and another table for A - BUT I don't have a good clear rationale for it. (Don't have enough experience to see things in terms of scalability, maintainability, normality, reliability, clarity, etc. and future impact) Maybe scalability and maintainability would be prioritized?

Thanks for your thoughts!

HDer
  • 385
  • 5
  • 17
  • 2
    You really need to clarify how question and blog are related to each other. Is a question just the same as a blog, only it can have answers and a blog can't? – gview May 02 '20 at 21:22
  • Does this answer your question? [How can you represent inheritance in a database?](https://stackoverflow.com/questions/3579079/how-can-you-represent-inheritance-in-a-database) – philipxy May 02 '20 at 22:40
  • @gview - yes, correct. The blogs are not related and can't have answers. Questions can have answers. – HDer May 02 '20 at 22:47
  • With the dual use solution I proposed, technically there is nothing stopping you at a database level from adding answers to a blog item, but your code should insure that doesn't happen. – gview May 02 '20 at 23:42

1 Answers1

0

I think storing all in one table may make queries simpler where there is a relationship between them, but the table gets much larger quicker... What is a good approach to a database/table design like this with the expectation this community can grow quite large over time (10K to 100K active users)?

Even a minimally resourced mysql server will be fine with tables with tens of millions of rows in them. That is not an excuse to ignore the basic principles of database normalization.

You should not conflate your core table design with performance tuning and optimization or scalability for that matter.

My educated guess

Question and Blog are essentially subtypes of the same entity. I'd use the same table, maybe calling it "content" or "item". Use a tinyint or char[1] column to designate if it's a blog or answer.

The "type specific" columns might warrant subtype tables with a defining relationship (sharing the key of the item table) that would allow for you to join and get those type specific attributes if you need them. That's more complicated to code for, and if you only have a handful of those attributes it would be simpler and probably not much overhead to just have them in the item table. Unused varchar() columns for example have no real cost if a row doesn't have them. Those columns must not be declared as not null because they are optional.

user
----
id (pk) unsigned integer
username varchar(100)
etc..

item
----
id (pk) unsigned integer
user_id (fk) (author of question/blog post)
type not null unsigned tinyint (1 = "blog", 2="question")
title varchar(100)
detail text
created_at timestamp

answer
------
id (pk) unsigned integer
user_id (fk) (stores user key)
item_id (fk) (stores parent item key)
details text
created_at timestamp

This is the basic skeleton that most systems of this type would have in the simplest form. It's predicated on a simple One to Many relationship (one Item can have Many Answers). If you consider it, an answer is not really different from a comment.

gview
  • 14,876
  • 3
  • 46
  • 51
  • Seems reasonable with a logical separation of items/answers. Thank you! – HDer May 03 '20 at 03:37
  • Here's a real quick MySQL performance tip -- make sure you set your tables to use the InnoDB engine if that is not the default. You want all your tables to be InnoDB and not MyISAM. – gview May 03 '20 at 03:49