Do link tables need a meaningless primary key field?

Question

I am working on a couple of link tables and I got to thinking (Danger Will Robinson, Danger) what are the possible structures of a link table and what are their pro's and con's.

I came up with a few possible strictures for the link table:

Traditional 3 column model

id - auto-numbered PRIMARY
table1fk - foreign key
table2fk - foreign key

It's a classic, in most of the books, 'nuff said.

Indexed 3 column model

id - auto-numbered PRIMARY
table1fk - foreign key INDEX ('table1fk')
table2fk - foreign key INDEX ('table2fk')

In my own experience, the fields that you are querying against are not indexed in the traditional model. I have found that indexing the foreign key fields does improve performance as would be expected. Not a major change but a nice optimizing tweak.

Composite key 2 columns `ADD PRIMARY KEY ('table1fk' , 'table2fk')`

table1fk - foreign key
table2fk - foreign key

With this I use a composite key so that a record from table1 can only be linked to a record on table2 once. Because the key is composite I can add records (1,1), (1,2), (2,2) without any duplication errors.

Any potential problems with the composite key 2 columns option? Is there an indexing issue that this might cause? A performance hit? Anything that would disqualify this as a possible option?

This is the same as http://stackoverflow.com/questions/1839584/should-i-use-an-index-column-in-a-many-to-many-link-table — Ralph Lavelle, Dec 04 '09 at 04:15

score 22 · Answer 1 · answered Dec 03 '09 at 22:13

22

I would use composite key, and no extra meaningless key.

I would not use a ORM system that enforces such rules on my db structure.

answered Dec 03 '09 at 22:13

Brimstedt

3,020
22
32

besides, I try to avoid IDENTITY/AUTO_INCREMENT fields, as they tend to make work with tables harder, and gives you less control. But it's beside the point in this case, and they DO have their purpose.. – Brimstedt Dec 03 '09 at 22:37
1

I woudln't choose to use an ORM that enforces that, either, but in some jobs (like mine) you don't get to choose sometimes. I work on close to 40 projects a year, and not all of them start from scratch but are enhancements on existing client sites where an ORM is already in place. – Matt Dawdy Dec 04 '09 at 15:48
It depends on your use case. – Brimstedt Feb 23 '17 at 14:28

Cade Roux · Accepted Answer · 2009-12-03T23:52:47.593

For true link tables, they typically do not exist as object entities in my object models. Thus the surrogate key is not ever used. The removable of an item from a collection results in a removal of an item from a link relationship where both foreign keys are known (Person.Siblings.Remove(Sibling) or Person.RemoveSibling(Sibling) which is appropriately translated at the data access layer as usp_Person_RemoveSibling(PersonID, SiblingID)).

As Mike mentioned, if it does become an actual entity in your object model, then it may merit an ID. However, even with addition of temporal factors like effective start and end dates of the relationship and things like that, it's not always clear. For instance, the collection may have an effective date associated at the aggregate level, so the relationship itself may still not become an entity with any exposed properties.

I'd like to add that you might very well need the table indexed both ways on the two foreign key columns.

score 13 · Answer 3 · answered Dec 03 '09 at 22:25

13

If this is a true many-to-many join table, then dump unecessary id column (unless your ORM requires one. in that case you've got to decide whether your intellect is going to trump your practicality).

But I find that true join tables are pretty rare. It usually isn't long before I start wanting to put some other data in that table. Because of that I almost always model these join tables as entities from the beginning and stick an id in there.

answered Dec 03 '09 at 22:25

Mike

351
1
5

1

this was valuable insight on their initial and later practicality. thanks – Kevin Danikowski Sep 18 '19 at 00:27

score 11 · Answer 4 · answered Jan 14 '11 at 00:32

11

Having a single column pk can help out alot in disaster recovery situation. So though while correct in theory that you only need the 2 foreign keys. In practice when the shit hits the fan you may want the single column key. I have never been in a situation where i was screwed because I had a single column identifier but I have been in ones where I was screwed because I didn't.

answered Jan 14 '11 at 00:32

mark

111
1
2

8

+1 for pointing this out, I know I should have a separate primary key for enabling disaster recovery but I don't know how it really helps. Can you please elaborate on how having a separate primary key helps and/or point to some materials? – Ferit Buyukkececi Nov 12 '14 at 16:18

score 4 · Answer 5 · answered Dec 03 '09 at 23:58

4

Composite PK and turn off clustering.

answered Dec 03 '09 at 23:58

Josh

68,005
14
144
156

3

could you elaborate on the clustering? why turn it off? – Brimstedt Dec 04 '09 at 05:18
4

When you create a primary key by default it is created as a clustered index. That's fine if your primary keys will be primarily inserted or sought in sorted order. But for a link table like this the relationships will most likely not be in any meaningful order thus it becomes more work for SQL to maintain the table. – Josh Dec 04 '09 at 14:49

score 3 · Answer 6 · answered Jul 13 '13 at 18:01

3

I have used composite key to prevent duplicate entry and let the database handle the exception. With a single key, you are rely on the front-end application to check the database for duplicate before adding a new record.

answered Jul 13 '13 at 18:01

Kumar SInha

31
1

score 2 · Answer 7 · answered Dec 04 '09 at 12:30

There is something called identifying and non-identifying relationship. With identifying relationships the FK is a part of the PK in the many-to-many table. For example, say we have tables Person, Company and a many-to-many table Employment. In an identifying relationship both fk PersonID and CompanyID are part of the pk, so we can not repeat PersonID, CompanyID combination.

TABLE Employment(PersonID int (PK,FK), CompanyID int (PK,FK))

Now, suppose we want to capture history of employment, so a person can leave a company, work somewhere else and return to the same company later. The relationship is non-identifying here, combination of PersonID, CompanyID can now repeat, so the table would look something like:

TABLE Employment(EmploymentID int (PK), PersonID int (FK), CompanyID int (FK), 
                     FromDate datetime, ToDate datetime)

Thank you for adding a reason as well as a context to your answer. That is really helpful in visualizing the suggestion vis-a-vis one's current predicament. — burf, Aug 20 '23 at 07:16

Matt Dawdy · Answer 8 · 2009-12-04T15:47:04.460

1

If you are using an ORM to get to/alter the data, some of them require a single-column primary key (Thank you Tom H for pointing this out) in order to function correctly (I believe Subsonic 2.x was this way, not sure about 3.x).

In my mind, having the primary key doesn't impact performance to any measurable degree, so I usually use it.

edited Dec 04 '09 at 15:47

answered Dec 03 '09 at 22:09

Matt Dawdy

19,247
18
66
91

2

By "primary key" I'm guessing that you mean, "single-column primary key". That's a pretty important distinction. A composite primary key is also a primary key. – Tom H Dec 04 '09 at 04:57
1

Doh! That's exactly what I meant. Thanks for the clarification. I'll edit the answer to show that, too. – Matt Dawdy Dec 04 '09 at 15:44

score 0 · Answer 9 · answered Dec 04 '09 at 00:00

0

If you need to traverse the join table 'in both directions', that is starting with a table1fk or a table2fk key only, you might consider adding a second, reversed, composite index.

ADD KEY ('table2fk', 'table1fk')

answered Dec 04 '09 at 00:00

martin clayton

76,436
32
213
198

score 0 · Answer 10 · answered Dec 04 '09 at 05:39

0

The correct answer is:

Primary key is ('table1fk' , 'table2fk')
Another index on ('table2fk' , 'table1fk')

Because:

You don't need an index on table1fk or table2fk alone: the optimiser will use the PK
You'll most likely use the table "both" ways
Adding a surrogate key is only needed because of braindead ORMs

answered Dec 04 '09 at 05:39

gbn

422,506
82
585
676

Really? I can understand the first point, but I am surprised that joining table1->link->table2 will perform differently than table2->link->table1 – Tyson of the Northwest Dec 04 '09 at 23:31
@tvanover: It gives the optimiser more options + don't forget: you'll have a WHERE or EXISTS or may not touch table 2 at all if do "WHERE link.table2fk = @table2keyvalue" – gbn Dec 05 '09 at 10:19

score -1 · Answer 11 · answered Dec 03 '09 at 22:01

-1

i've used both, the only benefit of using the first model (with uid) is that you can transport the identifier around as a number, whereas in some cases you would have to do some string concatenation with the composite key to transport it around.

i agree that not indexing the foreign keys is a bad idea whichever way you go.

answered Dec 03 '09 at 22:01

pstanton

35,033
24
126
168

Transport? How so? I am assuming moving from db to db, or db to text. I am unfamiliar with the term. – Tyson of the Northwest Dec 03 '09 at 22:05
I believe he means passing the entity around in your code, or in a querystring, etc. You could always have 2 parameters in whatever method/function you are calling, but sometimes it's easier to have 1 that represents the record as opposed to both. – Matt Dawdy Dec 03 '09 at 22:10

score -1 · Answer 12 · answered Dec 03 '09 at 22:43

I (almost) always use the additional single-column primary key. This generally makes it easier to build user interfaces, because when a user selects that particular linking entity I can identify with a single integer value rather than having to create and then parse compound identifiers.

Do link tables need a meaningless primary key field?

Traditional 3 column model

Indexed 3 column model

Composite key 2 columns `ADD PRIMARY KEY ('table1fk' , 'table2fk')`

12 Answers12

Linked

Do link tables need a meaningless primary key field?

Traditional 3 column model

Indexed 3 column model

Composite key 2 columns ADD PRIMARY KEY ('table1fk' , 'table2fk')

12 Answers12

Linked

Composite key 2 columns `ADD PRIMARY KEY ('table1fk' , 'table2fk')`