102

In our dev group we have a raging debate regarding the naming convention for Primary and Foreign Keys. There's basically two schools of thought in our group:

1:

Primary Table (Employee)   
Primary Key is called ID

Foreign table (Event)  
Foreign key is called EmployeeID

or

2:

Primary Table (Employee)  
Primary Key is called EmployeeID

Foreign table (Event)  
Foreign key is called EmployeeID

I prefer not to duplicate the name of the table in any of the columns (So I prefer option 1 above). Conceptually, it is consistent with a lot of the recommended practices in other languages, where you don't use the name of the object in its property names. I think that naming the foreign key EmployeeID (or Employee_ID might be better) tells the reader that it is the ID column of the Employee Table.

Some others prefer option 2 where you name the primary key prefixed with the table name so that the column name is the same throughout the database. I see that point, but you now can not visually distinguish a primary key from a foreign key.

Also, I think it's redundant to have the table name in the column name, because if you think of the table as an entity and a column as a property or attribute of that entity, you think of it as the ID attribute of the Employee, not the EmployeeID attribute of an employee. I don't go an ask my coworker what his PersonAge or PersonGender is. I ask him what his Age is.

So like I said, it's a raging debate and we go on and on and on about it. I'm interested to get some new perspectives.

Jeremy
  • 44,950
  • 68
  • 206
  • 332
  • 1
    question duplicates this http://stackoverflow.com/questions/208580/naming-of-id-columns-in-database-tables/ – Mike Henke Apr 14 '11 at 00:55
  • 1
    I read more than 10 similar questions and finally found the top 3 answers here are good : http://stackoverflow.com/a/465146/781695 – user Aug 15 '13 at 14:58
  • Just a side note: Choice 2 would allow you to 'Natural Join'. Heck, why not still do it in choice 1 by adding 'Employee.ID as EmployeeID'. But the better practise way seems to be 'Join' using 'ON Employee.ID = Event.EmployeeID'. – Leo Apr 21 '16 at 14:06
  • In both situations you will and up having to use alias (or 'table_name.column_name') in one or more queires because you are, in both cases repeating column names. – NaN Nov 11 '18 at 17:30

13 Answers13

81

If the two columns have the same name in both tables (convention #2), you can use the USING syntax in SQL to save some typing and some boilerplate noise:

SELECT name, address, amount
  FROM employees JOIN payroll USING (employee_id)

Another argument in favor of convention #2 is that it's the way the relational model was designed.

The significance of each column is partially conveyed by labeling it with the name of the corresponding domain.

Steven Huwig
  • 20,015
  • 9
  • 55
  • 79
  • 4
    SQL syntax and semantics actually give a pretty good clue as to how it should be used. e.g. USING syntax means columns with the same domain should have the same name, NULL = NULL -> NULL means NULL is "unknown" rather than "not applicable", and ON UPDATE CASCADE means that keys need only be unique, not immutable. – Steven Huwig Sep 03 '09 at 02:39
  • 7
    Even better, it allows this: `SELECT name, address, amount FROM employees NATURAL JOIN payroll`. – onedaywhen Feb 21 '12 at 11:45
  • 7
    I'd not use natural join in deployed code, because it's brittler in the case of schema additions. But for interactive querying, it's great. – Steven Huwig Apr 30 '12 at 20:44
  • 3
    +1 but there's always an exception. For example, if you have two columns in payroll that are both foreign keys to employee (one reference to the person who is being paid, the second to the manager with budget authority, for example). But we can't name *both* foreign keys `employee_id`. – Bill Karwin Sep 04 '13 at 03:36
  • I'm not sure you should let 0.1% edge cases dictate your entire naming strategy. – niico May 27 '16 at 12:01
  • 1
    The "using" keyword is MySql specific. Doesn't work in T-SQL—unfortunately. – birdus Aug 08 '17 at 18:46
  • @onedaywhen This is not better but worse. Because it is implicit join. Natural join is antipattern. Use explicit USING instead. – Eldar Agalarov Aug 21 '20 at 13:10
  • @EldarAgalarov: natural join is a relational operator, not a 'pattern'. But you have every right to feel 'anti' it! – onedaywhen Sep 14 '20 at 12:19
62

It doesn't really matter. I've never run into a system where there is a real difference between choice 1 and choice 2.

Jeff Atwood had a great article a while back on this topic. Basically people debate and argue the most furiously those topics which they cannot be proven wrong on. Or from a different angle, those topics which can only be won through filibuster style endurance based last-man-standing arguments.

Pick one and tell them to focus on issues that actually impact your code.

EDIT: If you want to have fun, have them specify at length why their method is superior for recursive table references.

Russell Steen
  • 6,494
  • 6
  • 38
  • 56
  • 34
    +1, for common sense... There are more important things to argue about.. So, do it my way (choice 2) – Charles Bretana Sep 02 '09 at 19:28
  • 6
    And, for self-referencing DRI, when there are more than one FK that self-references the same PK, you HAVE to violate both "standards", since the two FK columns can't be named the same... e.g., EmployeeTable with EmployeeId PK, SupervisorId FK, MentorId Fk, PartnerId FK, etc. etc... – Charles Bretana Sep 02 '09 at 19:31
13

I think it depends on your how you application is put together. If you use ORM or design your tables to represent objects then option 1 may be for you.

I like to code the database as its own layer. I control everything and the app just calls stored procedures. It is nice to have result sets with complete column names, especially when there are many tables joined and many columns returned. With this stype of application, I like option 2. I really like to see column names match on joins. I've worked on old systems where they didn't match and it was a nightmare,

KM.
  • 101,727
  • 34
  • 178
  • 212
  • 4
    +1 for having to figure out joins with non matching column names – Raj More Sep 02 '09 at 20:00
  • 4
    on "old systems" the handicap of 8 character long names that hurts a lot more than this. I'm willing to go out on a limb and speculate that having the PK named ID was not the primary cause of the nightmare in the old systems you were dealing with. Also "it sucked in old systems" is used waaaaay too often in software development, especially databases. I routinely see people justifying any given practice A, based on the way it worked in their experience on a DB system released 10+ years ago. – Russell Steen Sep 02 '09 at 20:04
  • 2
    today's _state of the art_ applications will be _old crap_ in a few years. you might even rewrite the interface, or use the data in another platform, but your data (including your column names) will need to stand the test of time. – KM. Sep 02 '09 at 20:22
  • 2
    So people 20 years ago should have somehow used column names that made sense today, even though they only had 8 characters? Data storage formats have changed drastically over the past 20 years, and will change again in the next 20. There's no way to demonstrate that your preference will stand the test of time better than the other method listed. "column names" may themselves be "old crap" by the time people are having this discussion in 20 years, as our ability to store and manipulate data improves. Tables are a human construct that imperfectly represent data relations... – Russell Steen Sep 02 '09 at 20:37
  • "ID" - wow that communicates a lot! I now see the light, I'll do it your way @Russell Steen – KM. Sep 02 '09 at 20:45
  • 1
    Thank you for the well reasoned intellectual response. – Russell Steen Sep 10 '09 at 18:19
  • @Russell Steen, in your answer you say "Basically people debate and argue the most furiously those topics which they cannot be proven wrong on". You are one of those people! let it go. In my answer I try to explain why both might be used, follow your own advice and "focus on issues that actually impact your code" – KM. Sep 10 '09 at 19:19
  • 1
    As @KM. says - people are going back to "ID" a lot because of ORMs - and MVCs "convention over configuration" - Ruby on Rails was a catalyst / began the trend of standard naming conventions - and the primary key being called ID is one convention. When using an ORM - eg Dapper Contrib - there is an implication that "ID" is the best approach because it requires less configuration ([Key] annotation on the primary key). – niico May 27 '16 at 12:05
7

Have you considered the following?

Primary Table (Employee)   
Primary Key is PK_Employee

Foreign table (Event)  
Foreign key is called FK_Employee
Wouter
  • 1,829
  • 3
  • 28
  • 34
  • 6
    I can't stand when people down vote and don't put a reason why. This is a completely valid answer, whether or not it is palatable for some is a different question, but that is subjective and doesn't necessitate a down vote. – Jeremy Aug 19 '16 at 17:49
  • 1
    Thanks for pointing this out. I would also be interested in the reasons why you would *not* use this format. And I'm quite sure there will be good reasons... – Wouter Aug 20 '16 at 11:17
  • This is the best way out as you won't have to use `table_name.column_name` in queries and won't have to use alias for column names if don't have repeated names... – NaN Nov 11 '18 at 17:34
  • 1
    This could be considered a form of Hungarian notation. So consider the arguments in favor and against that. – Fred Nov 02 '19 at 22:01
3

Neither convention works in all cases, so why have one at all? Use Common sense...

e.g., for self-referencing table, when there are more than one FK column that self-references the same table's PK, you HAVE to violate both "standards", since the two FK columns can't be named the same... e.g., EmployeeTable with EmployeeId PK, SupervisorId FK, MentorId Fk, PartnerId FK, ...

Charles Bretana
  • 143,358
  • 22
  • 150
  • 216
3

I agree that there is little to choose between them. To me a much more significant thing about either standard is the "standard" part.

If people start 'doing their own thing' they should be strung up by their nethers. IMHO :)

MatBailie
  • 83,401
  • 18
  • 103
  • 137
  • 3
    +1 for recognizing that consistency is more important than being "right" (in this case) – Russell Steen Sep 02 '09 at 20:05
  • -1 for attempting to apply a "foolish consistency". Old chinese proverb says "A foolish consistency is a hobgoblin for simple minds." – Charles Bretana Jan 21 '10 at 20:54
  • @charles: in a world where different people maintain each others code, often when the writer has left and the documentation is obsolete or non-existant, this is not a foolish consistency. I am so glad I don't work with You... – MatBailie Jan 21 '10 at 22:32
  • @Dems, no offense intended, but this is foolish, for two reasons. 1) There are common, clearly understood scenarios where ANY standard would HAVE to be violated. (see my answer for examples and 2) because on this issue, at least, a standard would add very little value, except to make folks who like standards feel more comfortable... – Charles Bretana Jan 22 '10 at 01:04
  • @charles: You seem to be suggesting "if it ain't perfect, don't try it". Unfortunately, common sense to one person is not always common sense to another. In order to engender familiarity and consistency, standards -are- of benefit. I have recently worked on a piece of code where multiple people have worked on it, each with their own naming conventions for tables and fields. It's a mess, and a nightmare to maintain. Standards are a common "common sense" rather than everyone using their own personal "common sense". – MatBailie Jan 25 '10 at 13:46
  • @Dems, The pendelum doesn't have to swing so far to one side that it crashes into and knocks itself over. I suggest no such thing. A "Standard" that must be broken in common scenarios is no standard at all, and only adds confusion and extra work. Not having a standard does not imply "a nightmare to maintain". Do you have a standard for what devs wears to the office? for left or right handed mice? Or do you require everyone use a trackball? Not everything needs to be standardized. My suggesting this issue does not need a standard does not mean I advocate no standards at all... – Charles Bretana Jan 25 '10 at 16:42
  • +1 for Dems for making sense, and -1 for Charles Bretana for making mountains of molehills. – JYelton Apr 30 '10 at 19:39
  • @JYelton, Applying standards when there is no reason to do so other than making youself feel more comfortable or more organized is what is "making a mountain out of a molehill"... advocating that unnecessary standards be left alone is indeed leaving the molehill untouched and undisturbed. It never fails to amaze me how arguments can be presented that are exactly opposite the substance of what they advocate. – Charles Bretana Apr 30 '10 at 22:19
  • 1
    you could argue "ID" is more consistent - because as soon as you introduce the english language "carID" in "cars" table or "car" table? "sheepID" in "sheep" table or "sheeps" - things start to become inconsistent. If you stick to "ID" and singular table names - this is not only consistent is plays nice with many ORMs / requires less configuration too (eg Dapper Contrib) – niico May 27 '16 at 12:09
3

If you are looking at application code, not just database queries, some things seem clear to me:

  1. Table definitions usually directly map to a class that describes one object, so they should be singular. To describe a collection of an object, I usually append "Array" or "List" or "Collection" to the singular name, as it more clearly than use of plurals indicates not only that it is a collection, but what kind of a collection it is. In that view, I see a table name as not the name of the collection, but the name of the type of object of which it is a collection. A DBA who doesn't write application code might miss this point.

  2. The data I deal with often uses "ID" for non-key identification purposes. To eliminate confusion between key "ID"s and non-key "ID"s, for the primary key name, we use "Key" (that's what it is, isn't it?) prefixed with the table name or an abbreviation of the table name. This prefixing (and I reserve this only for the primary key) makes the key name unique, which is especially important because we use variable names that are the same as the database column names, and most classes have a parent, identified by the name of the parent key. This also is needed to make sure that it is not a reserved keyword, which "Key" alone is. To facilitate keeping key variable names consistent, and to provide for programs that do natural joins, foreign keys have the same name as is used in the table in which they are the primary key. I have more than once encountered programs which work much better this way using natural joins. On this last point, I admit a problem with self-referencing tables, which I have used. In this case, I would make an exception to the foreign key naming rule. For example, I would use ManagerKey as a foreign key in the Employee table to point to another record in that table.

  • Many object-relational mappers (ORM) such as Entity Framework allows you to map a table to a class with a different name. This allows you to have a class named "User" and a table named "Users". – Fred Nov 02 '19 at 22:15
2

The convention we use where I work is pretty close to A, with the exception that we name tables in the plural form (ie, "employees") and use underscores between the table and column name. The benefit of it is that to refer to a column, it's either "employees _ id" or "employees.id", depending on how you want to access it. If you need to specify what table the column is coming from, "employees.employees _ id" is definitely redundant.

Jarett Millard
  • 5,802
  • 4
  • 41
  • 48
  • I've not decided if I like pluralised table names. By using the singular the queries seem to read better ("employee.name" as opposed to "employees.name"). Even in joins it seems to read better as you are joining single records to another table. But pluralised table names seems more accurate when thinking about the table, rather than the query. I'll be sticking with singular as that is what we use, but i think it's also the right way to go (though again, many disagree) – MatBailie Sep 03 '09 at 18:39
  • Yeah. It's more of a personal preference and/or whatever you're used to seeing, I guess. – Jarett Millard Sep 04 '09 at 13:34
2

I like convention #2 - in researching this topic, and finding this question before posting my own, I ran into the issue where:

I am selecting * from a table with a large number of columns and joining it to a second table that similarly has a large number of columns. Both tables have an "id" column as the primary key, and that means I have to specifically pick out every column (as far as I know) in order to make those two values unique in the result, i.e.:

SELECT table1.id AS parent_id, table2.id AS child_id

Though using convention #2 means I will still have some columns in the result with the same name, I can now specify which id I need (parent or child) and, as Steven Huwig suggested, the USING statement simplifies things further.

JYelton
  • 35,664
  • 27
  • 132
  • 191
  • 2
    `SELECT *` is a no-no for (most) production queries, anyway, so that's not much of a reason to pick a naming standard. – P Daddy Apr 30 '10 at 19:42
  • 1
    Not disagreeing: could you provide a link to a reason why this is so? I don't like the idea of having to maintain the names of 80 columns in my query. – JYelton Apr 30 '10 at 20:19
  • Can't find a link at the moment (hard to google for "*"), but I'll outline the basic points: (1) changes to the table(s) can negatively impact your application, (2) it can be bad for performance, and (3) specifying explicitly what data you actually need can make your code easier to understand. These points could expansion, and there are exceptions (as I alluded to) but that's not appropriate here. If you post this as a new question, I (and others) would be glad to elaborate further. – P Daddy Apr 30 '10 at 21:10
  • 2
    I may do that. I realize the performance benefit, but have to consider the time investment when editing code. I am always looking for ways to improve the interaction between the app and the database. Thanks. – JYelton Apr 30 '10 at 22:11
  • 1
    I am not so sure `SELECT *` is a no-no for most production queries. If it increases your speed of development significantly, and makes your code much more terse and readable - allowing you to focus on more important matters - why not `SELECT *`? It very much depends on the circumstances of each situation and is a trade off between many factors. One rule rarely fits everything. – niico May 27 '16 at 12:12
2

I've always used userId as a PK on one table and userId on another table as a FK. 'm seriously thinking about using userIdPK and userIdFK as names to identify one from the other. It will help me to identify PK and FK quickly when looking at the tables and it seems like it will clear up code when using PHP/SQL to access data making it easier to understand. Especially when someone else looks at my code.

Ross
  • 1,013
  • 14
  • 32
1

I use convention #2. I'm working with a legacy data model now where I don't know what stands for in a given table. Where's the harm in being verbose?

OMG Ponies
  • 325,700
  • 82
  • 523
  • 502
1

How about naming the foreign key

role_id

where role is the role the referenced entity has relativ to the table at hand. This solves the issue of recursive reference and multiple fks to the same table.

In many cases will be identical to the referenced table name. In this cases it becomes identically to one of your proposals.

In any case havin long arguments is a bad idea

Jens Schauder
  • 77,657
  • 34
  • 181
  • 348
1

"Where in "employee INNER JOIN order ON order.employee_id = employee.id" is there a need for additional qualification?".

There is no need for additional qualification because the qualification I talked of is already there.

"the reason that a business user refers to Order ID or Employee ID is to provide context, but at a dabase level you already have context because you are refereing to the table".

Pray, tell me, if the column is named 'ID', then how is that "refereing [sic] to the table" done exactly, unless by qualifying this reference to the ID column exactly in the way I talked of ?