1

If I created a table such as the following with just three columns:

column1 - column2 - column3 name - company - color

Is this bad practice in database table planning because it doesn't have a Primary Key with a auto-incrementing numerical ID?

Brian Tompsett - 汤莱恩
  • 5,753
  • 72
  • 57
  • 129
Simon Suh
  • 10,599
  • 25
  • 86
  • 110
  • 1
    It's difficult to tell without knowing how you'll use the table: How will you be using the table? Will it be joining to other tables? Is the data likely to change? What happens If I add "Bob/IBM/Blue" and "Bob/IBM/Green"? – Derek Tomes Jun 03 '16 at 02:41
  • 1
    A clear example of a case when a single numeric primary key is not useful: a link table for a N-N relationship (where the two foreign keys make a perfectly good combined primary key). So it is not a bad practice at all, but it is situational, as @DerekTomes illustrates. – Amadan Jun 03 '16 at 02:43
  • 4
    Possible duplicate of [Should each and every table have a primary key?](http://stackoverflow.com/questions/840162/should-each-and-every-table-have-a-primary-key) – Cole Jun 03 '16 at 02:46
  • The ideal primary key has several desirable characteristics, including:simple (single column, native datatype), short (not occupy a lot of space), uniqueness (no duplicate values), immutable (once assigned, it won't change), anonymous (doesn't convey meaningful information), non-null, etc. And a surrogate key satisfies a lot of the desirable properties. Plenty of systems are written successfully without surrogate primary keys, using natural keys as primary keys. But, successful systems are also written using surrogate primary key columns (e.g. auto_increment integer) for entity tables. – spencer7593 Jun 03 '16 at 02:51

3 Answers3

5

Almost all the tables I create have a synthetic, numeric, auto-incrementing (identity/serial) primary key. Here are reasons why I do this:

  • The primary key provides a record of the insertion order of records.
  • Foreign key constraints are easy to define and check for.
  • Rows cannot be duplicated.
  • It is easy to look up a particular row.
  • Fixed-width numeric values (integers and big ints) are efficient in indexes.

So, under most circumstances I would say that such as approach is a good practice.

Is not doing this a bad practice? Not necessarily. I go with consistency. If you have a natural primary key that happens to be a string (such as an email address, as one possibility), then go with it. If you are using a reference table to validate the values of a string, that's fine. Go for it. Just remember that it is good practice to define foreign keys, to understand insertion order, and to have efficient indexes -- and synthetic numeric keys help achieve these goals.

Gordon Linoff
  • 1,242,037
  • 58
  • 646
  • 786
  • It might be worth to mention, that - not necessarily but in most cases - the primary key is the clustered key. Using an auto-incrementing key will append new data at the end with almost no effect to the key. If your PK is random - in the worst case a non-serial GUID - the primary Key will be an index with heavy fragmentation and therefore cause bad performance. Even worse, this will affect other indexes which use the clustered one as lookup... – Shnugo Jun 03 '16 at 02:58
  • An auto_increment integer as a surrogate key satisfies a lot of desirable properties for a primary key (simple, short, unique, non-null, anonymous, immutable, et al.)... and those desirable properties lead directly into the benefits outlined in Gordon's answer. I ditto what Gordon says... though I apply this rule to "entity" tables. (Simple relationship tables, domain tables (list of valid values), tables for implementing multi-valued attributes, and datamart/datawarehouse/prebuilt result tables may not get an auto_increment integer primary key, with a suitable candidate key availa.. +10. – spencer7593 Jun 03 '16 at 03:02
  • @Shnugo . . . I think you are arguing for why foreign key references should be to the *primary* key and not to another key. In addition, whether the primary key is clustered depends on the database. – Gordon Linoff Jun 04 '16 at 13:15
2

This is flame-war territory which is why I assume the question will eventually be closed. Worse, some considerations depend on your database structure. But the basic issues include access vs natural primary keys and specifics regarding table layout in your RDBMS.

I want to go over both sides of the controversy however, and discuss some issues in each. Personally when I work with PostgreSQL, I prefer to use natural primary keys and usually add numeric secondary keys to make joins easier.

Natural vs Synthetic Primary Keys In the Abstract

In general I find natural primary keys cleaner semantically. The PRIMARY KEY designation thus acts partly as documentation as to what is functionally dependent on what. Within a given table, this can make longer-term maintenance much easier, particularly if you are shooting for well-nromalized tables.

The problem however is that often natural primary keys span columns and as requirements change the tables can change such that the primary keys can change. This makes change management a big problem particularly as tables grow and so very often one wants a layer of abstraction around that problem, which means a second, autoincrement field used for joins.

Access vs Data Sematics

In PostgreSQL there is no effectively no difference between a combination of a not null and a unique constraint and a primary key, so this works well. However this is not the case everywhere. In MySQL with InnoDB, the table is index-oriented around the primary key, so primary key lookups are optimized at the expense of other lookups.

For this reason in a database system like that, you will find there is a serious performance benefit to making the surrogate key primary and finding another way to document the natural primary key.

Note other database systems may allow for index oriented tables around indexes other than the primary key, which would lead to this consideration coming out differently yet.

For single column tables (used to effectively enforce an enum type where the db does not support this natively) however, I see absolutely no value in adding an additional numeric primary key.

Conclusion

Whether or not this is bad practice depends on what specifically you are doing and the database you are using. It isn't necessarily a bad thing but it can be. There are issues with both approaches and they play out different in different scenarios and different database systems. But hopefully the above helps introduce the issues.

Chris Travers
  • 25,424
  • 6
  • 65
  • 182
-3

Well in large projects we always use GUID ( Universally Unique Identifier) It is a 128-bit integer number used to identify resources. I remember reading somewhere that this is the "Academically" correct way. It also yields a lot information about a the assigned object like date, production line, manifacturing location, etc.

But of course it depends on what you are doing ... Im sure that a simple auto incremented int will do the job but the GUID is statistically more safe and professional.