1

I have 5 jobs which contain a transaction block (similar code just modifying different tables) which get executed through pg_cron background workers and are passed as a naked string to scheduler. The blocks consist of BEGIN;(some queries)END; and all of them succeed when ran as a query, but they fail when ran through pg_cron... The jobs are ran in parallel and they fail 80% of the time with FATAL: EndTransactionBlock: unexpected state BEGIN and I had very few times where one of them succeeded.

I only managed to reproduce this using background workers in Azure/Robin k8s cluster and stolon as a database, can't really reproduce it locally.

Here is the modified script: `

BEGIN;
        LOCK TABLE public.dummy IN ACCESS EXCLUSIVE MODE;
        ALTER TABLE public.dummy DISABLE TRIGGER ALL;
        CREATE INDEX IF NOT EXISTS idx_connecteddummy_id ON public.connecteddummy(id);
        CREATE INDEX IF NOT EXISTS idx_connecteddummy_originalid ON public.connecteddummy(originalid);
        CREATE TEMP TABLE connecteddummy_ids ON COMMIT DROP AS (
            SELECT d.Id 
            FROM public.dummy d 
            LEFT JOIN public.connecteddummy cd 
            ON d.Id = cd.Id 
            LEFT JOIN public.connecteddummy cdd 
            ON d.Id = cdd.OriginalId 
            WHERE cd.Id IS null 
            AND cdd.OriginalId IS null 
            ORDER BY d.Id 
            LIMIT 50000
        );
        CREATE INDEX IF NOT EXISTS idx_ids_id ON connecteddummy_ids(Id);
        DELETE FROM public.dummy d 
        USING connecteddummy_ids v 
        WHERE d.Id = v.Id;
        DROP INDEX IF EXISTS idx_connecteddummy_Id;
        DROP INDEX IF EXISTS idx_connecteddummy_OriginalId;
        ALTER TABLE public.dummy ENABLE TRIGGER ALL;
    END;

`

Script is used for removing unreferenced entities in Dummy table (unreferenced entities meaning that id and original id do not exist in connected dummy table)

nzivkovic
  • 11
  • 2
  • And do you start with DO ? https://www.postgresql.org/docs/current/sql-do.html – Frank Heikens Apr 03 '23 at 12:07
  • @FrankHeikens Nope, it starts with BEGIN and ends with END. I edited the question and added modified script that I use. Maybe it will help. – nzivkovic Apr 03 '23 at 12:33
  • Have you considered adding an [`ON DELETE CASCADE ON UPDATE CASCADE`](https://www.postgresql.org/docs/current/sql-createtable.html#SQL-CREATETABLE-EXCLUDE) to the foreign key column definition? That could potentially eliminate the need to keep running this job: [demo](https://dbfiddle.uk/2GDkhajI). – Zegarek Apr 03 '23 at 12:41
  • @Zegarek Yes, I used that but with very large dbs that operation takes long time (dummy table 5+ million rows and connected dummy 50+ million rows) and paired with some API and 50+ k unreferenced entities in 5 different tables = user not happy :) So, I moved that operation to be behind the scenes as users do no depend on it. Just a table cleanup. – nzivkovic Apr 03 '23 at 12:53
  • @nzivkovic That makes sense. [This](https://www.postgresql.org/message-id/Pine.LNX.4.61.0409061806480.2431%40sablons.cri.ensmp.fr) would suggest the error you're getting is caused by that code getting routine-wrapped by either you or the scheduler, making `BEGIN;` (or transactions at all) not allowed. That would make sense for a scheduler configured to make sure each task always runs in its own transaction. Replacing `BEGIN;...END;` with [`DO$proc$BEGIN...END$proc$;`](https://www.postgresql.org/docs/current/sql-do.html) would make it more wrap-compatible: [demo](https://dbfiddle.uk/Bo4ZhvDp). – Zegarek Apr 03 '23 at 14:45
  • Answered [here](https://stackoverflow.com/a/76284059/5298879). Turns out [`pg_cron` does in fact always wrap everything in transactions](https://github.com/citusdata/pg_cron/blob/main/src/pg_cron.c#L2157): *"We don't allow transaction-control commands like COMMIT and ABORT here. The entire SQL statement is executed as a single transaction which commits if no errors are encountered."* – Zegarek May 19 '23 at 17:50

0 Answers0