1

I have a table that has to store millions of posts (in the near future) in mysql. Here is the simplified structure (I didn't indicate my primary keys because my question is based on that):

CREATE TABLE `posts` (
    `post_id` INT NOT NULL AUTO_INCREMENT,
    `user_id` BIGINT(20) NOT NULL,
    `title` VARCHAR(100),
    `content` TEXT
) ENGINE = MyISAM;

My question is: What is the best way to define my keys?

  1. Is it best to use my AUTO_INCREMENTing 'post_id' only as a primary and a unique key considering the amount of records that needs to be stored?

  2. Must I use both 'post_id' and 'user_id' as a composite key to use as a primary and unique key? If this is best, how do I use it in other tables as a foreign key? Do I simply add them as columns in those tables?

Could you please also indicate the advantages and disadvantages (if any) of each and perhaps some advice on which ENGINE to use. I think Innodb will be best if I use the second option. I don't know.

Lightness Races in Orbit
  • 378,754
  • 76
  • 643
  • 1,055
Sthe
  • 2,575
  • 2
  • 31
  • 48
  • possible duplicate of [What's the best practice for primary keys in tables?](http://stackoverflow.com/questions/337503/whats-the-best-practice-for-primary-keys-in-tables) – millimoose Jan 08 '12 at 19:51
  • Also, theoretically, there shouldn't be a significant performance hit in accessing a multiple-column index compared to a single-column one, if all the columns are easily hashable (e.g. integers). – millimoose Jan 08 '12 at 19:54
  • 1
    you might find a previous answer of mine of interest http://stackoverflow.com/questions/4419499/mysql-nosql-help-me-to-choose-the-right-one-on-a/4421601#4421601 – Jon Black Jan 08 '12 at 20:32
  • Thank you f00. That is a brilliant answer and it addresses my question. +1 from me :-) – Sthe Jan 08 '12 at 20:57

1 Answers1

1

Whether you use an auto-incremented field as a primary key or a composite key using post_id and user_id will come down to basically the following:

If you have child tables to your posts table, will you ever want to query on those tables using the user-id of the post?

For example, if other uses are allowed to comment on a post, and you have a comments table, do you see reasons why you would want to get data from the comments table where you query on the user_id of the original post?

If so, by using a auto-incremented field, you are always going to have to join on the parent table (posts) in order to query on data on your child table based on the user_id:

SELECT comments.* 
FROM comments
INNER JOIN posts ON
    posts.post_id=comments.post_id
WHERE posts.user_id='scott.korin'

This can cause a performance hit, especially if you expect millions of rows of data in the posts table.

If you don't have the need to query on child tables using the user_id field, then I would use the auto-incremented post_id. Just make sure you define the field big enough. (if you except millions of records, you don't want to get stuck with only a maximum of a few millions records because you made the post_id field too small).

scott.korin
  • 2,537
  • 2
  • 23
  • 36
  • Thank you. That really helps. Just a follow up question: Are AUTO_INCREMENT primary keys reliable? Do they have any issues I might not know about? I didn't intend to use `user_id` to retrieve comments. I was just making an example and I could not find any other field to use. Sorry about that. – Sthe Jan 08 '12 at 20:38
  • As far as I know, they are pretty reliable, but then I am more versed in their use in Microsoft SQL Server :) – scott.korin Jan 08 '12 at 20:49