98

I am coming from MySQL, and in MySQL you can use AUTOINCREMENT for a row's unique id as the primary key.

I find that there is no AUTOINCREMENT in Postgresql, only SEQUENCE or UUID.I have read somewhere that we can use UUID as the primary key of a table. This has the added advantage of masking other user's id (as I want to build APIs that take the ID in as a parameter). Which should I use for Postgresql?

Braiam
  • 1
  • 11
  • 47
  • 78
Justin Leo
  • 1,975
  • 3
  • 15
  • 14
  • *PostgreSQL* provides [`SERIAL`](https://www.postgresql.org/docs/current/datatype-numeric.html#DATATYPE-SERIAL) for autoincrement keys. In addition to smaller size, higher efficiency, and better readability, integer keys have the advantage of being sortable. You need very strong and weighty justification to prefer `UUID`s. – Anton Shepelev Aug 23 '22 at 15:43

5 Answers5

138

A sequence in PostgreSQL does exactly the same as AUTOINCREMENT in MySQL. A sequence is more efficient than a uuid because it is 8 bytes instead of 16 for the uuid. You can use a uuid as a primary key, just like most any other data type.

However, I don't see how this relates to the masking of a user ID. If you want to mask the ID of a certain user from other users, you should carefully manage the table privileges and/or hash the ID using - for instance - md5().

If you want to protect a table with user data from snooping hackers that are trying to guess other IDs, then the uuid type is an excellent choice. Package uuid-ossp has several flavours. The version 4 is then the best choice as it has 122 random bits (the other 6 are used for identification of the version). You can create a primary key like this:

id uuid PRIMARY KEY DEFAULT uuid_generate_v4()

and then you will never have to worry about it anymore.


PostgreSQL 13+

You can now use the built-in function gen_random_uuid() to get a version 4 random UUID.

Alexis Wilke
  • 19,179
  • 10
  • 84
  • 156
Patrick
  • 29,357
  • 6
  • 62
  • 90
  • 3
    I'm not wanting to mask that user's ID, just not wanting to let the a user "guess" other user's id (e.g. is that user's ID is 154 and I'm doing an API request, one can guess there are other user IDs 153 or 155 and snoop around a GET request that doesn't require an access token) – Justin Leo Oct 22 '15 at 06:17
  • 26
    @Justin: I'd use both. Define your primary key as a `serial`, and include a unique `uuid` column to be used for API calls. – Nick Barnes Oct 22 '15 at 07:02
  • 3
    I'm unsure that the efficiency gains are worth the complexity of using primary and alternate keys, though anything that serves the purpose of obscuring potential access patterns to unknown/unauthorised data is good. Any system that relies on people not being able to guess keys is inherently insecure if the key space is consumed in a predictable (e.g. sequential) manner. – holdenweb Jan 31 '20 at 13:31
  • 12
    I agree with @holdenweb. If you're going to pay the indexing cost for a UUID, you might as well make it your PK. Sidenote, you may want to consider using [ULIDs](https://github.com/ulid/spec) instead of any RFC 4122 conforming UUID since ULIDs are lexicographically sortable while UUIDs aren't. This sorting has significant indexing advantages, as new ids are always added to the "end" of an index instead of randomly. – DharmaTurtle Aug 21 '20 at 12:44
  • 3
    the new version is `gen_random_uuid () → uuid`. From the official docs https://www.postgresql.org/docs/current/functions-uuid.html – Sang Nov 03 '21 at 04:20
  • @transang The `gen_random_uuid()` was included in the standard PostgreSQL installation as of version 13, i.e. September 2020. Not a version that everyone is using already. But thanks for the hint and answer updated. – Patrick Nov 03 '21 at 08:14
27

For many years I developed applications for databases using PKs and FKs as numerical sequential values. This has worked perfectly, but in recent years when creating cloud applications where information will be exchanged between applications and we will have integrations between various applications developed by us, we realized that the use of sequential IDs in our APIs ended up creating an effort.

In some applications we have to find the ID (of the target application) to be sent via the API call, on the other hand our database tables, in all our applications have, in addition to the sequential PK / FK column, a UUID column, which was not used in API calls. In this scenario we decided to rewrite the APIs so that the UUID column was used.

This solved some of the problems because one of our desktop applications would have their data migrated to another cloud application, this cloud application also used PK / FK columns. When migrating this data we had to change the values ​​of the PKs / FKs for new sequences as the sequences could clash between the values ​​of the desktop application and the values ​​of the cloud application. With this in mind we chose to switch cloud application PKs / FKs to UUID, since data coming from the desktop application had a UUID column.

The problem then was to convert the cloud application tables by turning the INT columns (PKs and FKs) into UUID columns without losing the table information. That was a big task, but it was made easier because I ended up building an application that makes this change easer. The application changes every PK / FK integer column to UUID, keeping the data and relationships. Anyone interested follows the link:

https://claytonbonelli.github.io/int_pk2uuid_pk/

Clayton Bonelli
  • 371
  • 3
  • 4
  • 20
    Existing serial ids can also be converted to uuids by using predictable uuid generator, for example `update x set uuid = uuid_generate_v5(uuid_ns_url(), 'some-namespace/' || id)`. Thus it's possible to add uuid columns for each table, populate them with values independently without necessity of doing lookups/mapping in memory makes things faster. – Ski Jan 18 '20 at 23:24
  • 1
    We are designing a new application, which will be integrated(in a few years maybe) with other internal systems. Your comment is very helpful! – Bryan Yin Nov 09 '22 at 05:54
  • UUIDs seem to be the more extensible and "futureproof" way of storing the PKs. Especially when dealing with disjointed systems that duplicate data, but need to have the same data consistency. – Matthew S Apr 10 '23 at 19:19
26

You can use UUID as primary key in your table as it will be unique. However do keep in mind that UUID will occupy a bit more space as compared to SEQUENCE. And also they are not very fast. But yes they are for sure unique and hence you are guaranteed to get a consistent data.

You can also refer:

Koen.
  • 25,449
  • 7
  • 83
  • 78
Rahul Tripathi
  • 168,305
  • 31
  • 280
  • 331
  • The article linked here about UUID vs sequence is a bit odd because it says more than once that the UUID is 37 characters long... – Marco Altieri Sep 23 '20 at 16:44
  • 5
    Can you quantify the "not very fast" statement? Of course generating a uuid will take longer than incrementing an id, but is the difference in time significant enough to be a legitimate concern? – mowwwalker Jan 31 '21 at 02:40
4

Other answers mention performance issues of UUIDs without much details or references (quote: 'they are not very fast').

I've found one synthetic test which measured UUID joins to be ~5% slower that int8, and index size ~40% higher (not twice, as one can expect from other answers, as index size is not the same as size of all elements).

Alleo
  • 7,891
  • 2
  • 40
  • 30
0

If case you are storing UUID in String type use this query to convert column type.

ALTER TABLE TABLE_NAME
    ALTER COLUMN ID TYPE uuid USING ID::uuid;

Next use the below query to make UUID column as auto-increment.

ALTER TABLE TABLE_NAME ALTER COLUMN ID SET DEFAULT gen_random_uuid();

If this still doesn't work, you may need to do the following as well:

CREATE EXTENSION pgcrypto;
neel4soft
  • 507
  • 1
  • 4
  • 12