1

If you always will know that a value (String in this case) returned from some API will be unique, is it okay to use that said value as the primary key to uniquely define the record?

Is there really any point to add an extra "id" column that'll just be an auto incremented value?

Stuy
  • 105
  • 3
  • 10

2 Answers2

3

Natural primary keys are a perfectly viable and reasonable as primary keys. However, there are some things to think about:

  1. One of the main uses of primary keys is for foreign key relationships. Integer primary keys are pretty small and all the same length. Strings are not as good for indexes, because they (usually) vary in length and are (often) longer than an integer.
  2. Are you sure that the definitions of the primary key will not change over time?
  3. Is it safe (from a privacy/security perspective) to have the external key visible in a bunch of tables (those with foreign key references).

I will say that synthetic foreign keys (i.e. the artificial ones created as numbers) can have downsides. If you do this for every database you create, then you have incompatible keys across the database. So, something like "product id" ends up depending on the system where it is defined -- which can cause unnecessary confusion.

Gordon Linoff
  • 1,242,037
  • 58
  • 646
  • 786
  • Let's say if those keys I'm using come from like an API, e.g: an API might send back an order and appointment. Now an appointment and order have a 1 to 1 relationship in this scenario. Would it be better to use the API's ids as a primary/foreign key constraint, or should I use my own generated IDs? – Stuy Dec 07 '20 at 18:47
  • I completely agree. Where it definitely makes a difference is the growth of a DB if there are many records in the DB. There the size of tables and indexes are things which should taken in account. One pattern which is often used: using a generic PK and adding the "natural" pk as unique column to the table, so you gain the pros of a generic key with a very small penalty of performance and size. – Bernd Farka Dec 07 '20 at 18:48
  • 1
    @stuy: basically it "should" be ok to use the keys you get but on the other hand you have no control about the API, if there is an bug you get an id twice you can't do anything.. if you have an generic id, you are still able to handle that – Bernd Farka Dec 07 '20 at 18:49
1

Aside from performance issues, a PK must serve the business logic. A PK does not always have to just be an auto-sequence number. An auto-sequence number is still useful for quick user interactions such as inquiry and update requests and in ETL cases. Consider what would happen if the API changes and would not always return a unique value. Also, look at your data row and identify the column or columns that would make the record unique from a business sense. You may get or more "candidate primary keys". Consider using the shortest of those instead of an API key and instead of the Auto-sequence. In short, I suggest you not rely on data you can't control unless there is a "contract" of some source that says it will always be unique.

NoChance
  • 5,632
  • 4
  • 31
  • 45