1

I did some research but couldn't find a clear answer.

I have a simple many-to-many relational table, the user_id will be foreign and group_id will be foreign, is there a reason to add a PRIMARY KEY?

Dean
  • 7,814
  • 8
  • 30
  • 31
  • 2
    No, it doesn't. But you will need to set `INDEX` for both columns of the pivot-table. – tereško Nov 04 '14 at 14:30
  • not required, but indexes WILL help speed up the queries, and a primary key would prevent duplicate `user_id/group_id` tuples from being added to the m-t-m table. – Marc B Nov 04 '14 at 14:32
  • 1
    FYI, I would recommend adding a [junction table](http://en.wikipedia.org/wiki/Junction_table) which has both PKs and FKs. – Payam Nov 04 '14 at 14:37
  • 2
    I always add an auto-incrementing PK to tables like this...I don't think there is a clear cut yes or no to this question - more of a personal preference. I personally put an auto-incrementing PK on EVERY table. – Leslie Nov 04 '14 at 14:43
  • @MarcB: But it won't prevent adding the same `user_id/group_id` under a different PRIMARY KEY will it? – Dean Nov 04 '14 at 14:47
  • 2
    In theory, you do not need a PK. However, in practice it's a different matter. Since you're using MySQL, and MySQL is using InnoDB as its default engine, and InnoDB is using primary key to perform clustering (in mortal language - that's the thingy that magically makes it fast for us) - the lack of user defined PK will force InnoDB to create a hidden one. And it's going to be 8 byte integer, opposed to the usual 4 byte one. Therefore - no, you don't have to create it. But you will have it. You just won't be able to use it. TL;DR: create one yourself. – N.B. Nov 04 '14 at 14:53
  • @dean: if you had `primary key (user_id, group_id)`, then it'd be impossible to add two+ copies of the same user_id/group_id to that table. – Marc B Nov 04 '14 at 14:55
  • @MarcB: So your'e suggesting to simply create a primary key as a merge of the other two? Or is this a built in function in MySQL? – Dean Nov 04 '14 at 14:59
  • If you are not worried about potential performance hit, you can do what @MarcB said and create a primary key using `(user_id, group_id)`. However, because of how intricate beast our beloved InnoDB is, that might (in future) prove to be a bottleneck. However, if you are not dealing with tens of millions of records in that table, you can do what Marc suggested. The best of all world would be `auto_increment` primary key and `unique(user_id, group_id)`. – N.B. Nov 04 '14 at 15:02
  • @N.B. would you care to quantify the potential performance hit? Given that the table exists only for the purpose of joining the two other tables, you'll never be using that PK index you're creating anyway. It'll just be there to take up space. The multi-field primary key correctly describes the relationship. – Ian McLaird Nov 13 '14 at 18:00
  • @IanMcLaird - what I'm referring to is valid for InnoDB only. If no PK is specified, it will create a hidden one. If a PK that's not auto_increment exists, then you there exists a possibility that next inserted key is not larger than previous. If that's the case, fragmentation and rebalancing occurs and that kills performance heavily. That's why it's important for InnoDB tables to have auto incrementing primary key, even if it's not explicitly used. It just keeps things working fast and as intended. Also, it's not entirely wrong to have a PK you don't use, it just has to identify a row anyway. – N.B. Nov 14 '14 at 11:49
  • @Dean my answer should not accepted, if I'm understanding your question right. For a many-to-many relationship, there's no logical / modeling reason for there to be a PK. From a speed/efficiency perspective, any possible improvements(I don't there will be any) from having a meaningless PK are minuscule and in my opinion irrelevant – Don Cheadle Mar 30 '15 at 21:25

2 Answers2

2

I'm going to post a dissenting opinion from the accepted answer. You should not add an auto-increment primary key to join (or cross-reference) tables. Here's why.

First, you're not going to use it. The only reason this table exists is to track the many-to-many relationship between your users and groups. Adding the auto-increment field just adds another field that you're not going to read. In other words, the only time you're going to query this table is because you want to see which users belong to a group, or which groups a user is a member of. In both cases, this extra primary key table will not be used.

Secondly, if you add additional data to this table, the primary key would still be useless. Consider: If we add a field to indicate that a group membership is "primary" for a given user, we still need to know both the user and the group for this "primary-ness" to be meaningful.

Technically you never need an auto-incremented, "surrogate" key for any table. You just need a field or set of fields that you know to be unique (thus putting us in at least 2nd Normal Form). The trouble is that most of the time, we're very bad at predicting what will be sufficiently unique, so we add this extra key to protect ourselves from that problem. Cross-references are not one of the cases we're bad at predicting. It will never be the case that a user will be a member of the same group twice, and it will never be the case that a group will contain the same user twice. We know that combination to be unique, so the additional surrogate key adds nothing.

However, note that if you're adding fields to a table like this (other than the two foreign keys), you may actually be modeling something that's not really a many-to-many. You may actually be modeling one-to-many-to-one. For instance, if you're adding fields to the mapping of users to groups, you may actually be modeling a "Membership" entity. That might have earned a surrogate key, because it might start to make sense to have one user with different memberships to the same group at different times. But again, that's modeling something else than a many-to-many relationship.

Ian McLaird
  • 5,507
  • 2
  • 22
  • 31
  • What about a `many-to-many` relationship guarantees `it will never be the case that a user will be a member of the same group twice`? That's the key difference between our answers – Don Cheadle Nov 13 '14 at 19:46
  • I think I'm realizing that the fact I've been working in a Data Vault for months has really skewed my view. Because in a Data Vault, you track all past and current relationships which is why a many-to-many still has a surrogate key to differentiate difference occurences of the relationship, even if it's the same `user_id` and `group_id` at two different times – Don Cheadle Nov 13 '14 at 19:48
  • Yes. When you have a need to do that, the relationship isn't really a many-to-many anymore. The relationship table is now an entity in its own right. You have a `one-to-many` from `User` to `Membership` and a `one-to-many` from `Group` to `Membership`, but it no longer makes sense to talk about the `Group`s a `User` belongs to. Now you'd talk about their `Membership`s. – Ian McLaird Nov 13 '14 at 19:57
  • I'm not saying your perspective is wrong (thus I didn't down vote). It's just that the modeling requirements are different. You *do* need to use the proper design. – Ian McLaird Nov 13 '14 at 19:58
1

What if you ever want to do something more interesting with that table (let's call it table A), such as make a satellite off of A or have another many-to-many table connecting A and some other table B?

Might there be a time where you want a table to reference an occurrence of a certain user_id with a certain group_id?

Basically, if you ever want to be referencing table A, or table A is a parent of some other table, you need a primary key on table A. And in my mind, in a relational DB, it's very likely you'll want to be referencing it at some point.

So I say add an auto-increment PK (assuming you're okay with duplicate user_id, group_id rows, like @Leslie said in a comment.

Also, note that if you give it an auto-increment PK, it would still be worthwhile to put an index on user_id and an index on group_id for joining.

Other similar SO questions:

SQL - many-to-many table primary key

Should Many to Many Tables Have a Primary Key?

Community
  • 1
  • 1
Don Cheadle
  • 5,224
  • 5
  • 39
  • 54
  • It's worth noting that both of the questions you link to disagree with your answer. – Ian McLaird Nov 13 '14 at 17:53
  • Yeah, I disagree with the "accepted" answer in both cases, but the discussion is still there. Both answers are from around 2010 as well... I think people are more okay with surrogate PK's nowadays. I for one live by them as I work with a Reporting Database a lot. I think it's faulty to assume there won't be duplicates in many-to-many tables., so you need a surrogate PK to interact with that info in an accurate way – Don Cheadle Nov 13 '14 at 17:55
  • But then you're not actually modeling a many-to-many relationship anymore. You're modeling something else that happens to reference two child tables in a non-unique way. – Ian McLaird Nov 13 '14 at 18:20
  • As for the answers being from 2010, that's only four years ago, and the database normalization rules that give rise to these designs date from the 1970s. – Ian McLaird Nov 13 '14 at 18:37
  • @IanMcLaird with time comes experience... if OP is truly asking about a many-to-many relationship, and not something else, then my answer is 100% wrong... I'd delete this answer but it's accepted – Don Cheadle Mar 30 '15 at 21:22