3

I am trying to make a self-managing partition table setup with Postgres. It all revolves around this function but I can't seem to get Postgres to accept my table names. Any ideas or examples of self-managing partition table trigger functions?

My current function:

DECLARE
day integer;
year integer;
tablename text;
startdate text;
enddate text;
BEGIN
day:=date_part('doy',to_timestamp(NEW.date));
year:=date_part('year',to_timestamp(NEW.date));
tablename:='pings_'||year||'_'||day||'_'||NEW.id;
-- RAISE EXCEPTION 'tablename=%',tablename;
PERFORM 'tablename' FROM pg_tables WHERE 'schemaname'=tablename;
-- RAISE EXCEPTION 'found=%',FOUND;
IF FOUND <> TRUE THEN
    startdate:=date_part('year',to_timestamp(NEW.date))||'-'||date_part('month',to_timestamp(NEW.date))||'-'||date_part('day',to_timestamp(NEW.date));
    enddate:=startdate::timestamp + INTERVAL '1 day';
    EXECUTE 'CREATE TABLE $1 (
        CHECK ( date >= DATE $2 AND date < DATE $3 )
    ) INHERITS (pings)' USING quote_ident(tablename),startdate,enddate;
END IF;
EXECUTE 'INSERT INTO $1 VALUES (NEW.*)' USING quote_ident(tablename);
RETURN NULL;
END;

I want it to auto-create a table called pings_YEAR_DOY_ID but it always fails with:

2011-10-24 13:39:04 CDT [15804]: [1-1] ERROR:  invalid input syntax for type double precision: "-" at character 45
2011-10-24 13:39:04 CDT [15804]: [2-1] QUERY:  SELECT date_part('year',to_timestamp( $1 ))+'-'+date_part('month',to_timestamp( $2 ))+'-'+date_part('day',to_timestamp( $3 ))
2011-10-24 13:39:04 CDT [15804]: [3-1] CONTEXT:  PL/pgSQL function "ping_partition" line 15 at assignment
2011-10-24 13:39:04 CDT [15804]: [4-1] STATEMENT:  INSERT INTO pings VALUES (0,0,5);

TRY 2

After applying the changes and modifying it some more (date is a unixtimestamp column, my thinking being that an integer column is faster than a timestamp column when selecting). I get the below error, not sure if I am using the proper syntax for USING NEW?

Updated function:

CREATE FUNCTION ping_partition() RETURNS trigger
    LANGUAGE plpgsql
    AS $_$DECLARE
day integer;
year integer;
tablename text;
startdate text;
enddate text;
BEGIN
day:=date_part('doy',to_timestamp(NEW.date));
year:=date_part('year',to_timestamp(NEW.date));
tablename:='pings_'||year||'_'||day||'_'||NEW.id;
-- RAISE EXCEPTION 'tablename=%',tablename;
PERFORM 'tablename' FROM pg_tables WHERE 'schemaname'=tablename;
-- RAISE EXCEPTION 'found=%',FOUND;
IF FOUND <> TRUE THEN
    startdate := to_char(to_timestamp(NEW.date), 'YYYY-MM-DD');
    enddate:=startdate::timestamp + INTERVAL '1 day';
    EXECUTE 'CREATE TABLE ' || quote_ident(tablename) || ' (
        CHECK ( date >= EXTRACT(EPOCH FROM DATE ' || quote_literal(startdate) || ')
            AND date < EXTRACT(EPOCH FROM DATE ' || quote_literal(enddate) || ') )
    ) INHERITS (pings)';
END IF;
EXECUTE 'INSERT INTO ' || quote_ident(tablename) || ' SELECT $1' USING NEW; 
RETURN NULL;
END;
$_$;

My statement:

INSERT INTO pings VALUES (0,0,5);

SQL error:

ERROR:  column "date" is of type integer but expression is of type pings
LINE 1: INSERT INTO pings_1969_365_0 SELECT $1
                                            ^
HINT:  You will need to rewrite or cast the expression.
QUERY:  INSERT INTO pings_1969_365_0 SELECT $1
CONTEXT:  PL/pgSQL function "ping_partition" line 22 at EXECUTE statement
Brian Tompsett - 汤莱恩
  • 5,753
  • 72
  • 57
  • 129
ehiller
  • 1,346
  • 17
  • 32

4 Answers4

12

Note: Since Postgres 10 declarative partitioning is typically superior to partitioning by inheritance as used in this case.


You are mixing double precision output of date_part() with text '-'. That doesn't make sense to PostgreSQL. You would need an explicit cast to text. But there is a much simpler way to do all of this:

startdate:=date_part('year',to_timestamp(NEW.date))
||'-'||date_part('month',to_timestamp(NEW.date))
||'-'||date_part('day',to_timestamp(NEW.date));

Use instead:

startdate := to_char(NEW.date, 'YYYY-MM-DD');

This makes no sense either:

EXECUTE 'CREATE TABLE $1 (
        CHECK (date >= DATE $2 AND date < DATE $3 )
    ) INHERITS (pings)' USING quote_ident(tablename),startdate,enddate;

You can only supply values with the USING clause. Read the manual here. Try instead:

EXECUTE 'CREATE TABLE ' || quote_ident(tablename) || ' (
            CHECK ("date" >= ''' || startdate || ''' AND
                   "date" <  ''' || enddate   || '''))
            INHERITS (ping)';

Or better yet, use format(). See below.

Also, like @a_horse answered: You need to put your text values in single quotes.

Similar here:

EXECUTE 'INSERT INTO $1 VALUES (NEW.*)' USING quote_ident(tablename);

Instead:

EXECUTE 'INSERT INTO ' || quote_ident(tablename) || ' VALUES ($1.*)'
USING NEW;

Related answer:

Aside: While "date" is allowed for a column name in PostgreSQL it is a reserved word in every SQL standard. Don't name your column "date", it leads to confusing syntax errors.

Complete working demo

CREATE TABLE ping (ping_id integer, the_date date);

CREATE OR REPLACE FUNCTION trg_ping_partition()
  RETURNS trigger
  LANGUAGE plpgsql SET client_min_messages = 'WARNING' AS
$func$
DECLARE
   _schema text := 'public';  -- double-quoted if necessary
   _tbl text := to_char(NEW.the_date, '"ping_"YYYY_DDD_') || NEW.ping_id;
BEGIN
   EXECUTE format('CREATE TABLE IF NOT EXISTS %1$s.%2$s
                   (CHECK (the_date >= %3$L
                       AND the_date <  %4$L)) INHERITS (%1$s.ping)'
           , _schema   -- %1$s
           , _tbl      -- %2$s  -- legal(!) name needs no quotes
           , to_char(NEW.the_date,     'YYYY-MM-DD')  -- %3$L
           , to_char(NEW.the_date + 1, 'YYYY-MM-DD')  -- %4$L
           );

   EXECUTE 'INSERT INTO ' || _tbl || ' VALUES ($1.*)'
   USING NEW; 

   RETURN NULL;
END
$func$;

CREATE TRIGGER insbef
BEFORE INSERT ON ping
FOR EACH ROW EXECUTE FUNCTION trg_ping_partition();

Postgres 9.1 added the clause IF NOT EXISTS for CREATE TABLE. See:

Postgres 11 added the more appropriate syntax variant EXECUTE FUNCTION for triggers. Use EXECUTE PROCEDURE in older versions. See:

to_char() can take a date as $1. That's converted to timestamp automatically. See:

I SET client_min_messages = 'WARNING' for the scope of the function to silence the flood of notices that would otherwise be raised on conflict by IF NOT EXISTS.

Multiple other simplifications and improvements. Compare the code.

Tests:

INSERT INTO ping VALUES (1, now()::date);
INSERT INTO ping VALUES (2, now()::date);
INSERT INTO ping VALUES (2, now()::date + 1);
INSERT INTO ping VALUES (2, now()::date + 1);

fiddle
OLD sqlfiddle

Erwin Brandstetter
  • 605,456
  • 145
  • 1,078
  • 1,228
  • Thanks that helped a ton (and I updated my post). I think I am having some issues with the syntax of USING NEW; in EXECUTE, but I can't find any good examples online other than what you had pointed me to. – ehiller Oct 25 '11 at 16:34
  • @variable: OK, I amended my anser with a complete solution. But read Pavel's answer carefully. Better to create the partition tables ahead of time! – Erwin Brandstetter Oct 26 '11 at 08:11
  • I have modified your code to my final function, thanks so much for the help. I have the auto delete, indexing, and ids incorporated. Also I have the date column as a bigint rather than date, I had guessed this to be faster on SELECTS. Thanks! – ehiller Oct 26 '11 at 15:31
3

Dynamic partitioning in PostgreSQL is just a bad idea. Your code is not safe in a multi-user environment. For it to be safe you would have to use locks, which slows down execution. The optimal number of partitions is about one hundred. You can easily create that many well in advance to dramatically simplify the logic necessary for partitioning.

Erwin Brandstetter
  • 605,456
  • 145
  • 1,078
  • 1,228
Pavel Stehule
  • 42,331
  • 5
  • 91
  • 94
  • Could you expand on why it's not safe in a multi-user environment? (I assume you mean "multi-connection environment.") This isn't immediately obvious to me. – jpmc26 May 19 '17 at 23:32
  • Without locks, there are race conditions. You can have two parallel inserts - `IF NOT EXISTS` is true for both connections, and `EXECUTE` will be processed in both connections - one should to fail, and one insert will be lost. Probably for some high frequent data and not important data it can not be issue, but still it is dirty code. – Pavel Stehule May 20 '17 at 03:55
2

You need to put your date literals in single quotes. Currently you are executing something like this:

 CHECK ( date >= DATE 2011-10-25 AND date < DATE 2011-11-25 )

which is invalid. In this case 2011-10-25 is interpreted as 2011 minus 10 minus 25

Your code needs to create the SQL using single quotes around the date literal:

CHECK ( date >= DATE '2011-10-25' AND date < DATE '2011-11-25' )
1

I figured out the entirety and it works great, even have an auto-delete after 30 days. I hope this helps out future people looking for an autopartition trigger function.

CREATE FUNCTION ping_partition() RETURNS trigger
    LANGUAGE plpgsql
    AS $_$
DECLARE
_keepdate text;
_tablename text;
_startdate text;
_enddate text;
_result record;
BEGIN
_keepdate:=to_char(to_timestamp(NEW.date) - interval '30 days', 'YYYY-MM-DD');
_startdate := to_char(to_timestamp(NEW.date), 'YYYY-MM-DD');
_tablename:='pings_'||NEW.id||'_'||_startdate;
PERFORM 1
FROM   pg_catalog.pg_class c
JOIN   pg_catalog.pg_namespace n ON n.oid = c.relnamespace
WHERE  c.relkind = 'r'
AND    c.relname = _tablename
AND    n.nspname = 'pinglog';
IF NOT FOUND THEN
    _enddate:=_startdate::timestamp + INTERVAL '1 day';
    EXECUTE 'CREATE TABLE pinglog.' || quote_ident(_tablename) || ' (
        CHECK ( date >= EXTRACT(EPOCH FROM DATE ' || quote_literal(_startdate) || ')
            AND date < EXTRACT(EPOCH FROM DATE ' || quote_literal(_enddate) || ')
            AND id = ' || quote_literal(NEW.id) || '
        )
    ) INHERITS (pinglog.pings)';
    EXECUTE 'CREATE INDEX ' || quote_ident(_tablename||'_indx1') || ' ON pinglog.' || quote_ident(_tablename) || ' USING btree (microseconds) WHERE microseconds IS NULL';
    EXECUTE 'CREATE INDEX ' || quote_ident(_tablename||'_indx2') || ' ON pinglog.' || quote_ident(_tablename) || ' USING btree (date, id)';
    EXECUTE 'CREATE INDEX ' || quote_ident(_tablename||'_indx3') || ' ON pinglog.' || quote_ident(_tablename) || ' USING btree (date, id, microseconds) WHERE microseconds IS NULL';
END IF;
EXECUTE 'INSERT INTO ' || quote_ident(_tablename) || ' VALUES ($1.*)' USING NEW;
FOR _result IN SELECT * FROM pg_tables WHERE schemaname='pinglog' LOOP
    IF char_length(substring(_result.tablename from '[0-9-]*$')) <> 0 AND (to_timestamp(NEW.date) - interval '30 days') > to_timestamp(substring(_result.tablename from '[0-9-]*$'),'YYYY-MM-DD') THEN
        -- RAISE EXCEPTION 'timestamp=%,table=%,found=%',to_timestamp(substring(_result.tablename from '[0-9-]*$'),'YYYY-MM-DD'),_result.tablename,char_length(substring(_result.tablename from '[0-9-]*$'));
        -- could have it check for non-existant ids as well, or for archive bit and only delete if the archive bit is not set
        EXECUTE 'DROP TABLE ' || quote_ident(_result.tablename);
    END IF;
END LOOP;
RETURN NULL;
END;
$_$;
ehiller
  • 1,346
  • 17
  • 32