2

1 database with 3 tables: user - photo - vote
- A user can have many photos.
- A photo can have many votes.
- A user can vote on many photos.
- A vote records:
. the result as an int (-1/disliked, 0/neutral, 1/liked)
. the id of the user who voted.

Here is what I have (all FKs are cascade on delete and update):
http://grab.by/iZYE
(sid = surrogate id)

My question is: this doesn't seem right, and I look at this for 2 days already and can't confidently move on. How can I optimize this or am I completely wrong?

jerrytouille
  • 1,238
  • 1
  • 13
  • 28
  • What (or who) is "sid"? Seems to me you're nearly there - a table of users(user_id*), a table of photos(photo_id*) and a table of votes (user_id*,photo_id*,vote). [* = (component of) PRIMARY KEY. – Strawberry Jan 14 '13 at 16:59
  • sid = surrogate id (auto-increment index PK) – jerrytouille Jan 14 '13 at 17:00
  • [Vote] shouldn't need a FK to the "owner" of the photo just to the user that voted, so photo_user_sid is redundant. Does that help? – CHill60 Jan 14 '13 at 17:02
  • the confusing part to me is user-photo (1..n) and photo-vote (1..n) while user-vote is also (1..n) (screenshot) – jerrytouille Jan 14 '13 at 17:07
  • @CHill60 - hmm that actually done by MySQL Workbench GUI tool... – jerrytouille Jan 14 '13 at 17:11
  • @CHill60 - that does help clear this up a bit, still hope to optimize this thing a bit more though. what is [Vote] prefix can i vote on your comment? if i could i would. – jerrytouille Jan 14 '13 at 17:30
  • [Vote] isn't a prefix - I was referring to the Vote table :-) – CHill60 Jan 15 '13 at 14:23

2 Answers2

1

The first thing I see is that you have duplicate unique IDs on the tables. You don't need the sid columns; just use user_id, photo_id, and photo_user_id (maybe rename this one to vote_id). Those ID columns should also be INT type, definitely not VARCHARs. You probably don't need the vote total columns on photo; you can just run a query to get the total when you need it and not worry about keeping both tables in sync.

Assuming that you will only allow one vote per user on each photo, the structure of the can be modified so the only columns are user_id, photo_id, and vote_result. You would then make the primary key a composite index on (user_id, photo_id). However, since you're using foreign keys, that makes this table a bit more complicated.

G-Nugget
  • 8,666
  • 1
  • 24
  • 31
  • actually all 'sid' are surrogate primary keys. 'user_id' is uniqid() generated, with 'photo_id' are saved for later purpose of use. they are not keys. 'photo_user_id' is generated by MySQL Workbench when 1..n from 'photo' to 'vote', it's not 'vote_id' which is the 'sid' of 'vote' table. – jerrytouille Jan 14 '13 at 17:25
  • I'm not sure what you mean by "surrogate primary keys", but you shouldn't need them. The way they are being used in your original schema is as the unique ID for each table, rendering the `user_id` columns useless. Why would you need an additional unique ID for each row in the tables? – G-Nugget Jan 14 '13 at 17:29
  • i was thinking the same thing about surrogate keys until i came across this huge argument here on stackoverflow (http://stackoverflow.com/questions/63090/surrogate-vs-natural-business-keys) and i think it makes sense. for why the extra unique id I use it for my local android sqlite db, which is the other side of the project i'm playing with. – jerrytouille Jan 14 '13 at 17:34
  • @jerrytouille So I read that question and I agree with what I've read: surrogate keys are useful if the normal unique key is complex/composite, but basically useless if the normal unique key is simple like an `INT`. The only table in your schema that could benefit from a surrogate key as I understand it is the `vote` table, but since it will never be referenced by the potential surrogate ID, there is no reason for it. As for the main tables, if you use an `INT` as the unique key, there is no reason to use a surrogate ID. – G-Nugget Jan 14 '13 at 17:41
  • I get all u stated except the last sentence which confused me a bit, the surrogate IDs (sid) are indeed being used as the primary unique keys in all tables. the 'user_id' and 'photo_id' are VARCHAR and uniqid() generated for my later need. How would you "make the primary key a composite index on (user_id, photo_id)" instead of using FKs? – jerrytouille Jan 14 '13 at 17:52
  • @jerrytouille Both the `sid` and `user_id` columns are unique. There's no reason to have two unique IDs for the same data unless you _really_ need a separate unique ID that cannot be an `INT`. – G-Nugget Jan 14 '13 at 17:54
  • @jerrytouille Let's continue this in a chat room: http://chat.stackoverflow.com/rooms/22714/the-g-spot – G-Nugget Jan 14 '13 at 17:57
  • hmm that's not true though i do have 'username' and 'email' are also unique and if i want to i can also use them as PKs but that's not it. anyway if u do it, how would u redesign this thing? (don't mean to drag this too long btw so thanks for keeping this up - lol now i see the chatroom invite) – jerrytouille Jan 14 '13 at 17:59
  • ok once again 'user_id' is uniqid() function generated (http://www.w3schools.com/php/func_misc_uniqid.asp) and it is not INT. it is just another 'unique' type field (like 'username') where i reserve for later use. same for 'photo_id'. i think u are missing the point here. anyway will return later to check on this. thanks for your sincere help, i mean it. – jerrytouille Jan 14 '13 at 18:11
  • @jerrytouille my point is that if you need a unique key later, just use the main unique key for the table, don't worry about creating another one. If you're making a key for future use, you can presumably choose what the key will looking like. An `INT` can be the unique key. If it is an int, then you would have two duplicate unique keys for the data. – G-Nugget Jan 14 '13 at 18:36
1

MySQL/InnoDB tables are always clustered (more on clustering here and here).

Since primary key also acts as a clustering key1, using the surrogate primary key means you are physically sorting the table in order that doesn't have a useful meaning for the client applications and cannot be utilized for querying.

Furthermore, secondary indexes in clustered tables can be "fatter" than in heap-based tables and may require double lookup.

For these reasons, you'd want to avoid surrogates and use more "natural" keys, similar to this:

enter image description here

({USER_ID, PICTURE_NO} in table VOTE references the same-named fields in PICTURE. The VOTE.VOTER_ID references USER.USER_ID. Use integers for *_ID and *_NO fields if you can.)

This physical model will enable extremely efficient querying for:

  • Pictures of the given user (a simple range scan on PICTURE primary/clustering index).
  • Votes on the given picture (a simple range scan on VOTE primary/clustering index). Depending on circumstances, this may actually be fast enough so you don't have to cache the sum in PICTURE.

If you need votes of the given user, change the VOTE PK to: {VOTER_ID, USER_ID, PICTURE_NO}. If you need both (votes of picture and votes of user), keep the existing PK, but create a covering index on {VOTER_ID, USER_ID, PICTURE_NO, VOTE_VALUE}.


1 In InnoDB. There are DBMSes (such as MS SQL Server) where clustering key can differ from primary.

Branko Dimitrijevic
  • 50,809
  • 10
  • 93
  • 167
  • By "create a covering index on {VOTER_ID, USER_ID, PICTURE_NO, VOTE_VALUE}" you mean adding an auto-increment index but make it to not be a PK, right? – jerrytouille Jan 15 '13 at 02:20
  • @jerrytouille This is not related to how values are generated, but what fields are included in the index. A regular index will include fields mentioned in WHERE/JOIN, but a covering index will _also_ include fields mentioned in the SELECT list. This will let the DBMS execute the query by accessing index _alone_, without touching the "main" table (which is either table heap or a B-Tree in the case of InnoDB). Please follow the link I have provided for thorough explanation. In the end, you'll have PK `{USER_ID, PICTURE_NO, VOTER_ID}` _and_ index `{VOTER_ID, USER_ID, PICTURE_NO, VOTE_VALUE}`. – Branko Dimitrijevic Jan 15 '13 at 08:54
  • hmm that's pretty cool, a bit out of my league so it takes some time to sink in. if i understand correctly your answer of the whole thing: eliminates all surrogate id (auto-incre index) PKs, use real business PKs ('user_id','photo_id'...), the only index to be used is the covering index on {VOTER_ID, USER_ID, PICTURE_NO, VOTE_VALUE} for the VOTE table. My question on that is: do we create the covering index as an extra column added to the table when we design it or is it something we do when running query? – jerrytouille Jan 15 '13 at 10:12
  • @jerrytouille _"eliminates all surrogate id"_ - essentially yes. Surrogates have [their purposes](http://stackoverflow.com/a/13964847/533120), but that probably doesn't apply here. _"only index to be used is the covering index"_ - there is an index under the PK as well. The secondary index is needed only if you need both "directions" of querying - otherwise just arrange the PK. _"do we create the covering index as an extra column"_ - no, the VOTE_VALUE needs to be in the table whether it is included in some index or not. _"something we do when running query"_ - no, index is created in advance. – Branko Dimitrijevic Jan 15 '13 at 11:44
  • ok, where do we place the 'created-in-advance covering-index'? is it in the VOTE table as an extra column besides {VOTER_ID, USER_ID, PICTURE_NO, VOTE_VALUE}? – jerrytouille Jan 15 '13 at 12:26
  • let me rephrase the question: it's not an extra column per se, it's the index you create with the VOTE table that covers {VOTER_ID, USER_ID, PICTURE_NO, VOTE_VALUE} correct? Example: CREATE TABLE test ( id INT NOT NULL, last_name CHAR(30) NOT NULL, first_name CHAR(30) NOT NULL, PRIMARY KEY (id), INDEX name (last_name,first_name) --> covering index ); – jerrytouille Jan 15 '13 at 13:20
  • @jerrytouille _"it's not an extra column per se, it's the index you create with the VOTE table that covers {VOTER_ID, USER_ID, PICTURE_NO, VOTE_VALUE} correct?"_ - yes. – Branko Dimitrijevic Jan 15 '13 at 14:39