PostgreSQL has hash functions for many column types. You can use hashtext
if you want integer hash values, or hashtextextended
if you prefer bigint
hash values.
Note that hashXXXextended
functions take one extra parameter for seed and 0 means that no seed will be used.
In the commit message the author Robert Haas says:
Just in case somebody wants a 64-bit hash value that is compatible
with the existing 32-bit hash values, make the low 32-bits of the
64-bit hash value match the 32-bit hash value when the seed is 0.
Example
postgres=# SELECT hashtextextended('test string of type text', 0);
hashtextextended
----------------------
-6578719834206879717
(1 row)
postgres=# SELECT hashtext('test string of type text');
hashtext
-------------
-1790427109
(1 row)
What about other data types?
You can check all available hash functions on your backend by checking the output for \df hash*
. Below you can see the functions available in PG 14.0.
hanefi=# \df hash*
List of functions
Schema | Name | Result data type | Argument data types | Type
------------+--------------------------+------------------+--------------------------+------
pg_catalog | hash_aclitem | integer | aclitem | func
pg_catalog | hash_aclitem_extended | bigint | aclitem, bigint | func
pg_catalog | hash_array | integer | anyarray | func
pg_catalog | hash_array_extended | bigint | anyarray, bigint | func
pg_catalog | hash_multirange | integer | anymultirange | func
pg_catalog | hash_multirange_extended | bigint | anymultirange, bigint | func
pg_catalog | hash_numeric | integer | numeric | func
pg_catalog | hash_numeric_extended | bigint | numeric, bigint | func
pg_catalog | hash_range | integer | anyrange | func
pg_catalog | hash_range_extended | bigint | anyrange, bigint | func
pg_catalog | hash_record | integer | record | func
pg_catalog | hash_record_extended | bigint | record, bigint | func
pg_catalog | hashbpchar | integer | character | func
pg_catalog | hashbpcharextended | bigint | character, bigint | func
pg_catalog | hashchar | integer | "char" | func
pg_catalog | hashcharextended | bigint | "char", bigint | func
pg_catalog | hashenum | integer | anyenum | func
pg_catalog | hashenumextended | bigint | anyenum, bigint | func
pg_catalog | hashfloat4 | integer | real | func
pg_catalog | hashfloat4extended | bigint | real, bigint | func
pg_catalog | hashfloat8 | integer | double precision | func
pg_catalog | hashfloat8extended | bigint | double precision, bigint | func
pg_catalog | hashhandler | index_am_handler | internal | func
pg_catalog | hashinet | integer | inet | func
pg_catalog | hashinetextended | bigint | inet, bigint | func
pg_catalog | hashint2 | integer | smallint | func
pg_catalog | hashint2extended | bigint | smallint, bigint | func
pg_catalog | hashint4 | integer | integer | func
pg_catalog | hashint4extended | bigint | integer, bigint | func
pg_catalog | hashint8 | integer | bigint | func
pg_catalog | hashint8extended | bigint | bigint, bigint | func
pg_catalog | hashmacaddr | integer | macaddr | func
pg_catalog | hashmacaddr8 | integer | macaddr8 | func
pg_catalog | hashmacaddr8extended | bigint | macaddr8, bigint | func
pg_catalog | hashmacaddrextended | bigint | macaddr, bigint | func
pg_catalog | hashname | integer | name | func
pg_catalog | hashnameextended | bigint | name, bigint | func
pg_catalog | hashoid | integer | oid | func
pg_catalog | hashoidextended | bigint | oid, bigint | func
pg_catalog | hashoidvector | integer | oidvector | func
pg_catalog | hashoidvectorextended | bigint | oidvector, bigint | func
pg_catalog | hashtext | integer | text | func
pg_catalog | hashtextextended | bigint | text, bigint | func
pg_catalog | hashtid | integer | tid | func
pg_catalog | hashtidextended | bigint | tid, bigint | func
pg_catalog | hashvarlena | integer | internal | func
pg_catalog | hashvarlenaextended | bigint | internal, bigint | func
(47 rows)
Caveats
If you want to have consistent hashes across different systems, make sure you have the same collation behaviour.
The built-in collatable data types are text
, varchar
, and char
. If you have different collation options, you can see different hash values. For example the sort order of the strings ‘a-a’ and ‘a+a’ flipped in glibc 2.28 (Debian 10, RHEL 8) compared to earlier releases.
If you wish to use the hashes on the same machine, you need not worry, as long as you do not update glibc
or use a different collation.
See more details at: https://www.citusdata.com/blog/2020/12/12/dont-let-collation-versions-corrupt-your-postgresql-indexes/