Partitioning on a UUID in Postgres 12 or 13

Question

Problem

I've been asked to copy a fair bit of data to Postgres in a new table. The data contains assembly component lists, simplified in the table definition below:

CREATE TABLE IF NOT EXISTS assembly_item (
    id               uuid       NOT NULL DEFAULT  NULL,
    assembly_id.     uuid.      NOT NULL DEFAULT  NULL,
    done_dts         timestamp  NOT NULL DEFAULT 'epoch', 

CONSTRAINT assembly_item_pk
    PRIMARY KEY (id) 
);

There are dozens of attributes in the original, and a few hundred million rows at the moment. These records are spread across several installations, and are not stored in Postgres locally. Insertions on this table add up quickly, and it's going to grow to 1B rows within the year, is the guess. Date is rarely updated, and never deleted. (It could happen in time, but not often.) The same id is never duplicated with different assembly_id values. So, unique at the partition level on id is safe. The goal here is to offload this data onto Postgres, and leave only very recent data in a cache on the local servers.

This looks like a natural candidate for partitioning, and I'm looking for some guidance on a sensible strategy. You can see from the simplified structure that we've got a unique row id, a parent assembly_id, and a timestamp. I've looked at the existing queries in the original database, and the main search field is assembly_id, the parent record identifier. The cardinality between assembly and assembly_item is around 1:200.

In order to make partitioning most useful, it seems like the data needs to be split based on a value that enables the query planner to prune out partitions intelligently. I've thought of a few ideas, but don't have the 200M rows to test agains yet. In the meantime, what I'm considering is:

Partition by month using either RANGE, or by LIST on YYYY-MM of the done_dts. Rewrite all of the queries to scope by date range.
Partition by HASH against the first two characters of assembly_id::text, giving me 256 partitions with fairly equal sizes. I think that this lets us search on assembly_id and prune out many partitions that won't have matches, but it looks pretty weird when I set it up.

I appreciate that I'm asking a somewhat speculative question, all I'm hoping for here are some pointers that might make my first attempt more successful. Once I've got a bit data set, I can experiment more directly.

I've included experimental setup code, with only a sampling of the partitions listed for brevity.

Sample setup using a `LIST` partition

------------------------------------
-- Define table partitioned by list
------------------------------------
-- Could alternatively use RANGE here to partition by month.

BEGIN;

-- Drop parent table, if they exists.
-- This destroys ALL partitions automatically, even without a CASCADE clause.
DROP TABLE IF EXISTS assembly_item_list CASCADE;

CREATE TABLE IF NOT EXISTS assembly_item_list (
    id                              uuid          NOT NULL DEFAULT NULL,
    assembly_id                     uuid          NOT NULL DEFAULT NULL,
    assembly_done_dts               timestamp     NOT NULL DEFAULT 'epoch', -- Copied in from assembly.done_dts when rows are pushed to Postgres.
    year_and_month                  citext        NOT NULL DEFAULT NULL,    -- YYYY-MM from assembly_done_dts, calculated in insert function. Can't use a generated column as a partition key.

-- Reminder: id values come from the various source tables in IB. The upsert writes over matches ON CONFLICT with this ID.
-- Note: You *must* include the partition key in the primary key. It's a rule.
CONSTRAINT assembly_item_list_pk
    PRIMARY KEY (year_and_month, id) 
) PARTITION BY LIST (year_and_month);

-- Previous year partitions built here...

-- Build out 2021 completely.
CREATE TABLE assembly_item_list_2021_01 partition of assembly_item_list HASH (assembly_id) ('2021-01');
CREATE TABLE assembly_item_list_2021_02 partition of assembly_item_list HASH (assembly_id) ('2021-02');
-- etc.

-- In case I screw up at the end of the year....
CREATE TABLE assembly_item_list_default partition of assembly_item_list default; 

COMMIT;

Sample setup using a `HASH` partition.

------------------------------------
-- Define table partitioned by hash
------------------------------------

BEGIN;

-- Drop parent table, if they exists.
-- This destroys ALL partitions automatically, even without a CASCADE clause.
DROP TABLE IF EXISTS assembly_item_hash CASCADE;

CREATE TABLE IF NOT EXISTS assembly_item_hash (
    id                              uuid          NOT NULL DEFAULT NULL,
    assembly_id                     uuid          NOT NULL DEFAULT NULL,
    assembly_done_dts               timestamp     NOT NULL DEFAULT 'epoch', -- Copied in from assembly.done_dts when rows are pushed to Postgres.
    partition_key                   text          NOT NULL DEFAULT NULL,    -- '00', '0A', etc. Populated in a BEFORE INSERT trigger on the partition. Can't use a generated column as a partition key, can't use a column reference in DEFAULT. 

-- Reminder: id values come from the various source tables in IB. The upsert writes over matches ON CONFLICT with this ID.
-- Note: You *must* include the partition key in the primary key. It's a rule.
CONSTRAINT assembly_item_hash_pk
    PRIMARY KEY (partition_key, id) 
) PARTITION BY HASH (partition_key);

-----------------------------------------------------
-- Create trigger function to populate partition_key
-----------------------------------------------------
-- The partition key is a two-character hex string, like '00', '3E', and so on.
CREATE OR REPLACE FUNCTION set_partition_key()
    RETURNS TRIGGER AS $$
    BEGIN
        NEW.partition_key = UPPER(LEFT(NEW.assembly_id, 2));
        RETURN NEW;
END;
$$ language plpgsql IMMUTABLE; -- I don't think that I need to worry about IMMUTABLE here. 01234567890ABCDEF shouldn't break. 

-----------------------------------------------------
-- Build partitions
-----------------------------------------------------
-- Note: Have to assign triggers to partitions individually.
-- Seems that it would be easier to add the logic to my central insert function.

CREATE TABLE assembly_item_hash_00 partition of assembly_item_hash FOR VALUES WITH (modulus 256, remainder 0);
CREATE TRIGGER set_partition_key_trigger_00
    BEFORE INSERT OR UPDATE ON assembly_item_hash_00
    FOR EACH ROW
    EXECUTE PROCEDURE set_partition_key();

CREATE TABLE assembly_item_hash_01 partition of assembly_item_hash FOR VALUES WITH (modulus 256, remainder 1);
CREATE TRIGGER set_partition_key_trigger_01
    BEFORE INSERT OR UPDATE ON assembly_item_hash_01
    FOR EACH ROW
    EXECUTE PROCEDURE set_partition_key();
    
-- And so on for all 256 partitions.

COMMIT;

Any advice? Really, anything that comes to mind?

`The same id is never duplicated with different assembly_id values. So, unique at the partition level on id is safe.` You mean, the same `id` is never duplicated. Period. Else it couldn't be the overall PK? — Erwin Brandstetter, Mar 29 '21 at 04:07
To `id` is genuinely unique, `assembly_id` is not unique. So, yes, I could make `id` the PK, but I want to partition by the `assembly_id` to keep records with the same parent ID in the same partition. If I try to partition by `assembly_id`, Postgres reports that I must include the partition key in the primary key definition. — Morris de Oryx, Mar 29 '21 at 04:11
I also just looked at the `hash` setup code and....it looks like I was being confused. I'm splitting out a substring of the first two characters, and then hashing that. Seems like all I need to do is `hash (assembly_id)`, and then `modulo` to get whatever number of partitions I prefer. That's appealing as the distribution across portions should be approximately even, while keeping `assembly_list` records with the same parent `assembly_id` in the same partition. — Morris de Oryx, Mar 29 '21 at 04:13
The randomness of UUIDs make them very inefficient when the table is bigger than can be cached in RAM. — Rick James, Apr 08 '21 at 18:10
@RickJames Thanks for the comment, but I'm not following. What part of the system is made inefficient because of UUIDs? In our case, we're pooling data from _lots_ of different sources that do not communicate. It was all UUID-based when I got here ;-) — Morris de Oryx, Apr 10 '21 at 07:18

Erwin Brandstetter · Accepted Answer · 2021-03-29T05:14:04.717

Whether date or a UUID-hash will be the better partition key, I cannot say. But I can say this: either of your solutions can be more efficient.

Hash partitioning based on `uuid`

Your plan to add a partition key column and populate it with a trigger function is very inefficient. And unnecessary. (Issues with the trigger function itself aside.)

There seems to be a misunderstanding. You have a comment:

-- Note: You must include the partition key in the primary key. It's a rule.

Not exactly. The manual:

Unique constraints (and hence primary keys) on partitioned tables must include all the partition key columns. This limitation exists because the individual indexes making up the constraint can only directly enforce uniqueness within their own partitions; therefore, the partition structure itself must guarantee that there are not duplicates in different partitions.

Partition key columns. Not partition keys.
A setup with hash partitioning on (assembly_id) works with a PK on the same column. Like this:

CREATE TABLE IF NOT EXISTS assembly_item_hash (
  assembly_id       uuid      NOT NULL
, id                uuid      NOT NULL
, assembly_done_dts timestamp NOT NULL DEFAULT 'epoch'
, PRIMARY KEY (assembly_id, id)
) PARTITION BY HASH (assembly_id);

CREATE TABLE assembly_item_hash_000 PARTITION OF assembly_item_hash FOR VALUES WITH (MODULUS 256, REMAINDER 0);
CREATE TABLE assembly_item_hash_001 PARTITION OF assembly_item_hash FOR VALUES WITH (MODULUS 256, REMAINDER 1);
-- etc.

Much simpler.

The only downside: the PK index is a larger, uuid occupies 16 bytes.

If that's an issue, you might fall back to that generated partition_key you had in mind. With a trigger per partition. (Ugh, the overhead!) But make the column integer instead of text, and use the much more efficient built-in hash function uuid_hash(). That's the function used for hash partitioning internally. But now we use it explicitly and go for LIST partitioning:

CREATE TABLE IF NOT EXISTS assembly_item_hash (
  id                uuid      NOT NULL
, assembly_id       uuid      NOT NULL
, partition_key     int4      NOT NULL
, assembly_done_dts timestamp NOT NULL DEFAULT 'epoch'
, PRIMARY KEY (partition_key, id)
) PARTITION BY LIST (partition_key);

Adds 4 bytes to each table row, saves 12 bytes from each index item - in theory. Due to alignment padding you lose another 4 bytes in table and index, ending up with the same total space on disk as before (roughly - table and index bloat can differ). Unless "column tetris" allows you to fit in that column more efficiently, to win up to 8 bytes per row total ... See:

Calculating and saving space in PostgreSQL

List partitioning based on `timestamp`

Don't use citext. Needless complication.

Use an integer number for YYYY-MM instead. Smaller, faster. I suggest this base function:

CREATE FUNCTION f_yyyymm(timestamp)
   RETURNS int
   LANGUAGE sql PARALLEL SAFE IMMUTABLE AS
'SELECT (EXTRACT(year FROM $1) * 100 + EXTRACT(month FROM $1))::int';

See:

How do you do date math that ignores the year?

That is *so much better*. Exactly what I was trying to achieve. And, clearly, continuing to pour more code on top was _not_ what was needed. Many thanks! — Morris de Oryx, Mar 29 '21 at 04:19
@MorrisdeOryx: I added some more pointers for possible alternatives. — Erwin Brandstetter, Mar 29 '21 at 04:51
Pure gold! More than happy to avoid the zillions of triggers, I appreciate it when built-in engine features allow me to get things done _without_ trigger overhead, particularly `per row`. Yes, `uuid` for sure....each source database is isolated, so we need `uuid`. I think they're over-used in our case some of the time, but perhaps not here. In any case, they're a fact of life. *Great point* on DTS to `int4` instead of `cutest`, the function is great. And, yes, I do "column Tetris" when I first set up table layouts. — Morris de Oryx, Mar 29 '21 at 06:24
Wait, `hash_uuid`? It works! I don't seem it in the manual...is this one of those things like `hashtext` that's available, but not really documented? I might just be missing it in the docs. — Morris de Oryx, Mar 29 '21 at 06:32
Now I see some of what I missed in the source & source docs: https://docs.huihoo.com/doxygen/postgresql/hashfunc_8c.html https://doxygen.postgresql.org/hashfunc_8c_source.html Lots and lots of type-specific hashes. Fantastic. — Morris de Oryx, Mar 29 '21 at 06:57

Partitioning on a UUID in Postgres 12 or 13

Problem

Sample setup using a LIST partition

Sample setup using a HASH partition.

1 Answers1

Hash partitioning based on uuid

List partitioning based on timestamp

Sample setup using a `LIST` partition

Sample setup using a `HASH` partition.

Hash partitioning based on `uuid`

List partitioning based on `timestamp`