2

We (almost) always add a general id column as a primary key while designing sql tables. If a table already contains a unique column, I wonder if we should designate that as the primary key and not use the id column or still use it while just adding a unique key index for the unique column. And what would be the implications of either of the approaches mentioned above?

We recently had a case where a table had a UUID designated as the primary key but the said column led to certain issues when mysqldump was used to dump the database. The said table also has a non-null unique column of type VARCHAR so to work around the aforementioned issue, the id column was removed and instead, the unique key column was designated as the primary key.

This made me wonder if we should always do this or only in specific cases and if it would impact the operations on that table in any way. Some of the questions that come to mind are:

  1. Would it affect the performance? If yes, what if the said table has a lesser number of records, eg., less than a hundred?
  2. The said unique column being a string-type column may contain anything. So, would that result in any issue while joining it with another table via a primary-foreign key relationship?

Any help in this regard is highly appreciated.

  • It depends on what is stored in that unique key column. Remember, whenever you have to create a foreign key in another table, you have to refer either PK or UK of the main table there. So, in the referencing table, you will have to repeat those values if you use UK as the referencing column. Also note that UK may have NULL in it, while PK cannot have NULL. So, if your UK is a NOT NULL column, then you can make it a PK. Just make sure it is logical. – Ishan May 03 '23 at 06:02
  • Performance-wise, there does not seem to be much of a difference between INT, CHAR, and VARCHAR indexes - https://stackoverflow.com/questions/332300/is-there-a-real-performance-difference-between-int-and-varchar-primary-keys – Ishan May 03 '23 at 06:12
  • This question is opinion-based, and too broad given Stack Overflow's [guidelines for asking questions](https://stackoverflow.com/help/how-to-ask). There's no specific case being presented in this question, so the only true answer is, "sometimes yes, sometimes no." – Bill Karwin May 03 '23 at 06:18
  • 1
    @BillKarwin yes, but at least you can give a "list of considerations" as my answer does. And the question can be reworded to ask for a list of considerations instead of asking "should we?" – Mike Nakis May 03 '23 at 06:23
  • @MikeNakis I've tried to post answers like you describe, and I've gotten downvotes and even sometimes had my answers deleted. The Stack Overflow admins are strict that questions should be about specific problems with code shown and answers should be the unequivocal correct answer to that problem. I do think broader questions would be useful, but it's not what they want on this site. – Bill Karwin May 03 '23 at 06:32
  • 1
    @MikeNakis Right. I have update the question title and also added some context to the question. – Junaid Farooq May 03 '23 at 06:39

2 Answers2

1

2 out of 3 tables I have written have a "natural" PRIMARY KEY. I avoid artificially adding an id to such tables.

Here are some common patterns. (id is AUTO_INCREMENT`)

  • A uuid is very random, hence this may be optimal:

      PRIMARY KEY (id),
      UNIQUE(uuid)
    
  • For many-to-mapping table between foo and bar, this saves both space and speed, while enforcing that you don't accidentally insert duplicates. It is implemented (in InnoDB) as two 2-column BTrees:

      PRIMARY KEY(foo_id, bar_id),  -- for mapping one way
      INDEX(bar_id, foo_id)         -- for mapping the other way
    
  • Sometimes the artificial id is beneficial, but this can let you use the PK for "locality of reference":

      PRIMARY KEY(user_id, id),  -- cluster together all the users stuff
      INDEX(id)    -- sufficient to heep AUTO_INCREMENT happy
    

As for your specific questions:

  • The datatype(s) of the column(s) in the PK are of secondary importance. The big issue is how many rows need to be touched.
  • When testing on multiple columns, a composite (multi-column) index usually wins over two individual columns, regardless of cardinality.
  • TEXT and BLOB cannot be indexed, and prefixing (eg, INDEX(ext(50)) does not really address your concerns.
  • Foreign keys only check for the existence of the value; the type of index (primary/unique/index) does notmatter.
  • UUIDs are evil in many ways; avoid them where practical. [My opinion.]
  • PARTITIONing a table adds more cans-of-worms.
  • INSERTing a row must wait for all PRIMARY and UNIQUE keys to be checked for dups before completing. Other secondary keys are inserted in a 'delayed' technique (using the "Change Buffer"). That is, there is a slight benefit in minimizing the number of unique keys in a table.
Rick James
  • 135,179
  • 13
  • 127
  • 222
0

Here is a list of considerations for using an existing unique column as primary key instead of adding a special column solely for that purpose:

  • The unique column must be non-nullable.
  • The unique column should be inexpensive. (For example an integer, not a string, especially not a compound key. This is a performance concern.)
  • The value of the unique column is expected to never change throughout the lifetime of the row. (Even if your RDBMS allows you to change the value of a primary key, in practice you are strongly advised to avoid doing such acrobatics.)
  • The unique column is not likely to be refactored into something else (especially not into a non-unique column) in a future revision of the database schema.
  • The unique column should, ideally, be a sequentially increasing number, so that you can make use of a clustered index. (This is a performance concern.)
Mike Nakis
  • 56,297
  • 11
  • 110
  • 142