28

Say I have a table like posts, which has typical columns like id, body, created_at. I'd like to generate a unique string with the creation of each post, for use in something like a url shortener. So maybe a 10-character alphanumeric string. It needs to be unique within the table, just like a primary key.

Ideally there would be a way for Postgres to handle both of these concerns:

  1. generate the string
  2. ensure its uniqueness

And they must go hand-in-hand, because my goal is to not have to worry about any uniqueness-enforcing code in my application.

John Bachir
  • 22,495
  • 29
  • 154
  • 227

6 Answers6

19

I don't claim the following is efficient, but it is how we have done this sort of thing in the past.

CREATE FUNCTION make_uid() RETURNS text AS $$
DECLARE
    new_uid text;
    done bool;
BEGIN
    done := false;
    WHILE NOT done LOOP
        new_uid := md5(''||now()::text||random()::text);
        done := NOT exists(SELECT 1 FROM my_table WHERE uid=new_uid);
    END LOOP;
    RETURN new_uid;
END;
$$ LANGUAGE PLPGSQL VOLATILE;

make_uid() can be used as the default for a column in my_table. Something like:

ALTER TABLE my_table ADD COLUMN uid text NOT NULL DEFAULT make_uid();

md5(''||now()::text||random()::text) can be adjusted to taste. You could consider encode(...,'base64') except some of the characters used in base-64 are not URL friendly.

Dwayne Towell
  • 8,154
  • 4
  • 36
  • 49
  • 2
    Could this result in a race condition? (I don't know enough about the runtime environment of pg functions to think it through...) – John Bachir Oct 23 '13 at 06:20
  • 3
    It is not safe because 2 concurrent inserts might end up with same random string and one will exit with error, use insert on unique field instead of select/not exists – BIOHAZARD Nov 05 '18 at 10:43
  • 1
    an infinitely better idea is to use something from the data with the current time to generate a hash md5(now() | email | password) will ensure each user gets a unique id – PirateApp Sep 13 '19 at 04:01
14

All existing answers are WRONG because they are based on SELECT while generating unique index per table record. Let us assume that we need unique code per record while inserting: Imagine two concurrent INSERTs are happening same time by miracle (which happens very often than you think) for both inserts same code was generated because at the moment of SELECT that code did not exist in table. One instance will INSERT and other will fail.

First let us create table with code field and add unique index

CREATE TABLE my_table
(
    code TEXT NOT NULL
);

CREATE UNIQUE INDEX ON my_table (lower(code));

Then we should have function or procedure (you can use code inside for trigger also) where we 1. generate new code, 2. try to insert new record with new code and 3. if insert fails try again from step 1

CREATE OR REPLACE PROCEDURE my_table_insert()
AS $$
DECLARE
    new_code TEXT;
BEGIN

    LOOP
        new_code := LOWER(SUBSTRING(MD5(''||NOW()::TEXT||RANDOM()::TEXT) FOR 8));
        BEGIN
            INSERT INTO my_table (code) VALUES (new_code);
            EXIT;
        EXCEPTION WHEN unique_violation THEN

        END;
    END LOOP;

END;
$$ LANGUAGE PLPGSQL;

This is guaranteed error free solution not like other solutions on this thread

Eric Aya
  • 69,473
  • 35
  • 181
  • 253
BIOHAZARD
  • 1,937
  • 20
  • 23
  • 1
    I suppose if you made the FOR small enough you'd want to exit the loop after a certain count so it doesn't go on forever. I agree this one is the best answer though, others are wrong – Dan Parker Jan 21 '19 at 19:40
  • 1
    Is there reason for creating `UNIQUE INDEX` with the `lower()` ? – Ulvi Jan 23 '22 at 16:09
  • 1
    Can we use it with a trigger? I tried but got error. If yes, could you please include it in the answer or help me out? I'm stuck – Ulvi Jan 23 '22 at 19:43
  • It's unfortunately not possible to use this as a trigger, if you try it wont work. – Jane Panda Nov 04 '22 at 22:27
8

Use a Feistel network. This technique works efficiently to generate unique random-looking strings in constant time without any collision.

For a version with about 2 billion possible strings (2^31) of 6 letters, see this answer.

For a 63 bits version based on bigint (9223372036854775808 distinct possible values), see this other answer.

You may change the round function as explained in the first answer to introduce a secret element to have your own series of strings (not guessable).

Community
  • 1
  • 1
Daniel Vérité
  • 58,074
  • 15
  • 129
  • 156
6

The easiest way probably to use the sequence to guarantee uniqueness (so after the seq add a fix x digit random number):

CREATE SEQUENCE test_seq;
CREATE TABLE test_table (
  id bigint NOT NULL DEFAULT (nextval('test_seq')::text || (LPAD(floor(random()*100000000)::text, 8, '0')))::bigint,
  txt TEXT
);
insert into test_table (txt) values ('1');
insert into test_table (txt) values ('2');
select id, txt from test_table;

However this will waste a huge amount of records. (Note: the max bigInt is 9223372036854775807 if you use 8 digit random number at the end, you can only have 922337203 records. Thou 8 digit is probably not necessary. Also check the max number for your programming environment!)

Alternatively you can use varchar for the id and even convert the above number with to_hex() or change to base36 like below (but for base36, try to not expose it to customer, in order to avoid some funny string showing up!):

PostgreSQL: Is there a function that will convert a base-10 int into a base-36 string?

Community
  • 1
  • 1
holdfenytolvaj
  • 5,637
  • 1
  • 17
  • 10
4

Check out a blog by Bruce. This gets you part way there. You will have to make sure it doesn't already exist. Maybe concat the primary key to it?

Generating Random Data Via Sql

"Ever need to generate random data? You can easily do it in client applications and server-side functions, but it is possible to generate random data in sql. The following query generates five lines of 40-character-length lowercase alphabetic strings:"

  SELECT
(
  SELECT string_agg(x, '')
  FROM (
    SELECT chr(ascii('a') + floor(random() * 26)::integer)
    FROM generate_series(1, 40 + b * 0)
  ) AS y(x)
)
FROM generate_series(1,5) as a(b);
John Bachir
  • 22,495
  • 29
  • 154
  • 227
Kuberchaun
  • 29,160
  • 7
  • 51
  • 59
  • 1
    The uniqueness enforcement is what I'm more interested in -- it's easy for me to generate a string but it's a pain to program uniqueness enforcement from the application side. I've updated my question to explain this a bit more clearly. – John Bachir Oct 23 '13 at 02:00
  • Well logically there are two ways. Check all existing records. Or introduce some kind of guaranteed uniqueness to your string. My thought was to concat the primary key to the random string. – Kuberchaun Oct 23 '13 at 02:13
1

Use primary key in your data. If you really need alphanumeric unique string, you can use base-36 encoding. In PostgreSQL you can use this function.

Example:

select base36_encode(generate_series(1000000000,1000000010));

GJDGXS
GJDGXT
GJDGXU
GJDGXV
GJDGXW
GJDGXX
GJDGXY
GJDGXZ
GJDGY0
GJDGY1
GJDGY2
Community
  • 1
  • 1
Tomas Greif
  • 21,685
  • 23
  • 106
  • 155