38

I have a postrges database with a table contains key codes that I have generated using a python function. I would like to be able to hash this column such that each time a key code is added to it, the key is hashed. How can I get postgres to do that? Or what will be the best way to store these codes. Here is a sample of the column I would like to hash in my table.

 key_codes | 
-----------+
 L7G4J83K  |        
 J70KG169  |         
 L69E540K  |        
 GL8E9C3J  |         
 6C0LE215  |         
 9G01C8JA  |         
 1G9KC58A  |         
ROMANIA_engineer
  • 54,432
  • 29
  • 203
  • 199
roykasa
  • 2,907
  • 6
  • 28
  • 29

3 Answers3

63

Use a trigger to set the hash column on insert and update. For SHA-256, use the pgcrypto extension module's digest function.

Since you haven't specified your PostgreSQL version I'll assume you're using the current 9.2 in the following examples.

Here's how to invoke a sha256 digest function:

regress=# CREATE EXTENSION pgcrypto;
CREATE EXTENSION

regress=> SELECT digest('blah', 'sha256');
                               digest                               
--------------------------------------------------------------------
 \x8b7df143d91c716ecfa5fc1730022f6b421b05cedee8fd52b1fc65a96030ad52
(1 row)

Note that the CREATE EXTENSION function must be run as a superuser.

The trigger is pretty simple. Something like this would do, assuming your table looks like this:

CREATE TABLE some_table ( key_codes text, hash bytea );

CREATE OR REPLACE FUNCTION hash_update_tg() RETURNS trigger AS $$
BEGIN
    IF tg_op = 'INSERT' OR tg_op = 'UPDATE' THEN
        NEW.hash = digest(NEW.key_codes, 'sha256');
        RETURN NEW;
    END IF;
END;
$$ LANGUAGE plpgsql;

CREATE TRIGGER some_table_hash_update 
BEFORE INSERT OR UPDATE ON some_table 
FOR EACH ROW EXECUTE PROCEDURE hash_update_tg();

Usage:

regress=> INSERT INTO some_table(key_codes) VALUES ('fred');
INSERT 0 1
regress=> SELECT * FROM some_table;
 key_codes |                                hash                                
-----------+--------------------------------------------------------------------
 fred      | \xd0cfc2e5319b82cdc71a33873e826c93d7ee11363f8ac91c4fa3a2cfcd2286e5
(1 row)

You can reduce the overhead of the trigger execution by making the update trigger conditional. Instead of the above CREATE TRIGGER, use both of these:

CREATE TRIGGER some_table_hash_insert
BEFORE INSERT ON some_table 
FOR EACH ROW 
EXECUTE PROCEDURE hash_update_tg(); 

CREATE TRIGGER some_table_hash_update 
BEFORE UPDATE ON some_table 
FOR EACH ROW 
WHEN ( NEW.key_codes IS DISTINCT FROM OLD.key_codes ) 
EXECUTE PROCEDURE hash_update_tg(); 
Craig Ringer
  • 307,061
  • 76
  • 688
  • 778
  • 2
    Why not `CREATE TRIGGER ... BEFORE INSERT OR UPDATE` instead of `IF tg_op = 'INSERT' OR tg_op = 'UPDATE'` ? – Ihor Romanchenko Dec 03 '12 at 13:34
  • 1
    Thanks Craig this has really helped. – roykasa Dec 03 '12 at 13:36
  • 4
    + `IF NEW.key_codes IS DISTINCT FROM OLD.key_codes THEN NEW.hash = ...`. Hash functions are really slow. – Ihor Romanchenko Dec 03 '12 at 13:37
  • @IgorRomanchenko Or better, in 9.2, `CREATE` a separate trigger for `UPDATE` that's defined as `WHEN NEW.key_codes IS DISTINCT FROM OLD.key_codes` so the trigger doesn't run at all unless the hash changes. I didn't want to complicate it too much for this user. As for `tg_op` - it looks like I omitted the `CREATE TRIGGER`, but that's what I did. The `tg_op` test is just paranoia and habit, I think it's a good idea to always show what conditions the trigger is supposed to handle. Edited. – Craig Ringer Dec 03 '12 at 23:50
  • 2
    @CraigRinger Thank you. Didn't know about `WHEN ( condition )` for a trigger. – Ihor Romanchenko Dec 04 '12 at 08:26
  • @CraigRinger Should those two triggers at the end have different names? Perhaps the first should be `some_table_hash_insert` rather than "_update"? – Basil Bourque Dec 08 '14 at 07:42
  • @CraigRinger I found two syntax errors when using your example code (modified to fit my case) under Postgres 9.4 (beta 2 or 3) using pgAdmin app. (1) The `WHEN` clause must have parens around its test condition. (2) The `FOR EACH ROW` must come *before* the `WHEN` clause. I edited your example code for both of these; please review. And much thanks for the extremely helpful spot-on answer. – Basil Bourque Dec 08 '14 at 08:30
  • @BasilBourque Thanks for that. I don't always have time to fully test all sample code in posts. – Craig Ringer Dec 08 '14 at 09:03
  • Why there is `CREATE TRIGGER ... BEFORE INSERT OR UPDATE` AND `IF tg_op = 'INSERT' OR tg_op = 'UPDATE'`? Isn't it reduntant? – omikron Oct 23 '15 at 11:27
  • @omikron Yes it is. It's there mostly for when people inevitably copy & paste, modify and get confused – Craig Ringer Oct 23 '15 at 11:32
  • why does it start with `\x`? – user72840184 Apr 12 '20 at 01:24
  • 1
    @user72840184 Indicates its a hexadecimal representation of binary – Craig Ringer Apr 24 '20 at 05:57
  • How could you avoid saving the key_codes attribute? Because I see that little sure – Braian Coronel Aug 30 '20 at 00:03
  • So much code for something so simple.... – étale-cohomology Oct 26 '21 at 12:44
38

Starting from PostgreSQL 11 you could use built-in functions to calculate hash value:

SELECT sha256('hello world!');
-- x7509e5bda0c762d2bac7f90d758b5b2263fa01ccbc542ab5e3df163be08e6ca9

db<>fiddle demo

Other Binary String Functions

+----------------+--------------+---------------+
|   Function     | Return Type  |  Description  |
+----------------+--------------+---------------+
| sha224(bytea)  | bytea        | SHA-224 hash  |
| sha256(bytea)  | bytea        | SHA-256 hash  |
| sha384(bytea)  | bytea        | SHA-384 hash  |
| sha512(bytea)  | bytea        | SHA-512 hash  |
+----------------+--------------+---------------+

Note that for historic reasons, the function md5 returns a hex-encoded value of type text whereas the SHA-2 functions return type bytea. Use the functions encode and decode to convert between the two, for example encode(sha256('abc'), 'hex') to get a hex-encoded text representation.

Lukasz Szozda
  • 162,964
  • 23
  • 234
  • 275
11

Starting with PostgreSQL 12 you can use a generated column; it could be as simple as:

CREATE TABLE codes (
    ...,
    key_code text,
    sha_code text GENERATED ALWAYS AS (encode(sha256(key_code::bytea), 'hex')) STORED
);

As with other methods, you may need to CREATE EXTENSION IF NOT EXISTS pgcrypto; if not already part of your schema.

Generated columns can be selected and indexed like any other:

CREATE INDEX idx_sha_codes ON codes USING btree (sha_code);
SELECT id, sha_code FROM codes WHERE sha_code = '...';

There are notable constraints e.g. they can't reference other generated columns or other tables. However, this sort of derived value is almost a perfect use case. More at https://www.postgresql.org/docs/12/ddl-generated-columns.html.

inopinatus
  • 3,597
  • 1
  • 26
  • 38
  • This approach appears to throw an error. `[42P17] ERROR: generation expression is not immutable` – Nathan Apr 16 '21 at 14:33
  • 1
    This is taken directly from a real working application, so no, it does not throw an error. If your expression is something different, then ensure it is an immutable result, otherwise postgresql will not permit it as a generated column. – inopinatus Apr 20 '21 at 06:07
  • why did you encode that as text? Could we save it as bytea column? – shampoo Dec 14 '21 at 14:49
  • 1
    It's encoded as text to suit both the original question, the natural behaviour of the ORM that's used in the application this came from, and the tendency of the author (myself) to work with human-readable formats first, and for SHA the common choices are hex or base64. You could indeed store it as a bytea if you preferred. In my case this would've shifted the burden of conversion to either the read or to the the application, so the outcome would be a wash, and definitely worse if any subsequent debugging or human-driven analysis required. – inopinatus Jan 25 '22 at 00:22