11

Let's say I need a simple table with account id and no other information. There are two ways to do it:

id varchar(255) PRIMARY KEY

Or to add a numeric primary key:

id int PRIMARY KEY
accountId varchar(255) UNIQUE NOT NULL

What are the advantages / disadvantages of both approaches and which one would you choose and why?

What implications does the first solution has to maintainability (what if we need to change the id for a single row) and for performance?

Richard Knop
  • 81,041
  • 149
  • 392
  • 552

3 Answers3

14

This boils down to the surrogate key versus natural key debate in the database world. See for example here, here and here for texts on the topic. I think both choices are valid, but in this case I would choose the AccountID as a natural key (given that the AccountID is unique for each account, will not be null, and will not be subject to changes), because it means less overhead. In this case, I do not see added value to a surrogate key.

Natural keys:

  • have meaning for the user
  • are hard to change when needed
  • may lead to needing less joins in queries

Surrogate keys:

  • don't mean anything to the user
  • are not subject to changes
  • may lead to needing more joins in queries
  • may require extra or larger indexes
Community
  • 1
  • 1
Josien
  • 13,079
  • 5
  • 36
  • 53
  • 2
    Excellent answer. I had not realised that the question in fact was about surrogate keys. Another problem would be that other tables, could have FKs pointing to this PK. Apart from the "stable keys" problem, it would also imply that all referencing tables would need a varchar(255) to refer to us. – wildplasser Dec 07 '12 at 11:46
  • What if the natural key is not going to ever change? Is it better to use natural keys then? – Richard Knop Dec 07 '12 at 11:49
  • Good answer but I disagree on the *"may require extra or larger indexes"* part. A `VARCHAR(255)` index (and thus 256 or 767 bytes wide, depending on charset) on the tables that reference this one will be needed in the "Natural keys" case and not in the "Surrogate keys" (where the indexes will probably be 4 or 8 bytes wide). – ypercubeᵀᴹ Dec 07 '12 at 11:53
  • If the natural key will never change, I'd say use it. But @wildplasser makes a good point: a varchar(255) column type is quite a large column to use as a key. If this column is referenced in lots of other tables, a surrogate might be better. – Josien Dec 07 '12 at 11:59
  • 1
    @ypercube I guessing what he meant is that you can't just _replace_ the natural key with the surrogate key. Surrogate key must typically exist _alongside_ the natural key, which means one _extra_ index (so the sum of the index sizes increases). That being said, if your access path is primarily through the surrogate, then surrogate's smaller index definitely has its advantages... – Branko Dimitrijevic Dec 07 '12 at 12:02
  • @BrankoDimitrijevic, that's what I meant, thanks for putting it so well. – Josien Dec 07 '12 at 12:26
  • I agree on the "extra or larger indexes" on this table. But there would be wider indexes on the other tables. – ypercubeᵀᴹ Dec 07 '12 at 12:39
2

The difference is that the PRIMARY KEY constraint implies/enforces a NOT NULL CONSTRAINT. In the first example the varchar(255) will be effectively promoted to varchar(255) NOT NULL

DROP SCHEMA tmp CASCADE;
CREATE SCHEMA tmp ;
SET search_path=tmp;

CREATE TABLE pk
        ( id varchar(255) PRIMARY KEY
        );

CREATE TABLE uniq
        ( id int PRIMARY KEY
        , accountid varchar(255) UNIQUE
        );

INSERT INTO pk (id) VALUES(NULL);
INSERT INTO uniq (id, accountid) VALUES(1, NULL);

Result:

DROP SCHEMA
CREATE SCHEMA
SET
NOTICE:  CREATE TABLE / PRIMARY KEY will create implicit index "pk_pkey" for table "pk"
CREATE TABLE
NOTICE:  CREATE TABLE / PRIMARY KEY will create implicit index "uniq_pkey" for table "uniq"
NOTICE:  CREATE TABLE / UNIQUE will create implicit index "uniq_accountid_key" for table "uniq"
CREATE TABLE
ERROR:  null value in column "id" violates not-null constraint
INSERT 0 1

The first insert fails because of the PK (-->>NOT NULL) constraint; the second one succeeds.

wildplasser
  • 43,142
  • 8
  • 66
  • 109
1

if the contents of that column are unique (which seems to be the case of IDs), then go ahead and make it the primary key, otherwise create another numeric column as a primary key.

regards,

CME64
  • 1,673
  • 13
  • 24