25

I have Postgresql Function which has to INSERT about 1.5 million data into a table. What I want is I want to see the table getting populated with every one records insertion. Currently what is happening when I am trying with say about 1000 records, the table gets populated only after the complete function gets executed. If I stop the function half way through, no data gets populated. How can I make the record committed even if I stop after certain number of records have been inserted?

Yousuf Sultan
  • 3,055
  • 8
  • 35
  • 63
  • Possible duplicate of [Are PostgreSQL functions transactional?](https://stackoverflow.com/questions/12778209/are-postgresql-functions-transactional) – rogerdpack Nov 24 '17 at 17:16

4 Answers4

18

This can be done using dblink. I showed an example with one insert being committed you will need to add your while loop logic and commit every loop. You can http://www.postgresql.org/docs/9.3/static/contrib-dblink-connect.html

CREATE OR REPLACE FUNCTION log_the_dancing(ip_dance_entry text)
RETURNS INT AS
$BODY$
    DECLARE
    BEGIN
        PERFORM dblink_connect('dblink_trans','dbname=sandbox port=5433 user=postgres');
        PERFORM dblink('dblink_trans','INSERT INTO dance_log(dance_entry) SELECT ' || '''' || ip_dance_entry || '''');
        PERFORM dblink('dblink_trans','COMMIT;');
        PERFORM dblink_disconnect('dblink_trans'); 

        RETURN 0;
    END;
$BODY$
  LANGUAGE plpgsql VOLATILE
  COST 100;

ALTER FUNCTION log_the_dancing(ip_dance_entry text)
  OWNER TO postgres;

BEGIN TRANSACTION;
  select log_the_dancing('The Flamingo');
  select log_the_dancing('Break Dance');
  select log_the_dancing('Cha Cha');
ROLLBACK TRANSACTION;

--Show records committed even though we rolled back outer transaction
select *
from dance_log;
ruvim
  • 7,151
  • 2
  • 27
  • 36
Kuberchaun
  • 29,160
  • 7
  • 51
  • 59
  • 2
    I typically only use this method when logging. Outside of logging it's usually not the correct approach to do batch commits, but I'll leave it to you to decide that for your use case. – Kuberchaun Mar 12 '14 at 13:45
  • Is dblink available for postgresql 9.2? – Yousuf Sultan Mar 13 '14 at 06:52
  • Yes, you should be able to execute CREATE EXTENSION dblink; from a SQL session connected as the postgres user and test it out. – Kuberchaun Mar 13 '14 at 13:32
  • 2
    @Bob It's also useful for long running load/transform/generation processes when you want to commit at some interval to avoid losing work (and therefore time). It could be argued that this belongs at the application tier (or in a shell script or something), but sometimes I have to do this kind of work manually and would really appreciate a lighter weight way of doing it in DB. – jpmc26 Feb 17 '15 at 22:35
  • Opening a new dblink connection every time you want to log some event is OK for tracking batch processes, but not for general log capture. Any other options here? – Andrew Wolfe May 01 '15 at 15:12
  • @AndrewWolfe Take a look at this idea https://github.com/GoatWalker/plog. The idea was to try and reusing a dblink connection if one existed to help reduce creating a new one every single time. It's been years since I played around with that, but maybe there's some value in it. – Kuberchaun Feb 11 '16 at 17:44
11

If you have the flexibility to change from function to procedure, from PostgreSQL 12 onwards you can do internal commits if you use procedures instead of functions, invoked by CALL command. Therefore your function will be changed to a procedure and invoked with CALL command: e.g:

    CREATE PROCEDURE transaction_test2()
    LANGUAGE plpgsql
    AS $$
    DECLARE
     r RECORD;
    BEGIN
      FOR r IN SELECT * FROM test2 ORDER BY x LOOP
        INSERT INTO test1 (a) VALUES (r.x);
        COMMIT;
      END LOOP;
    END;
    $$;

    CALL transaction_test2();

More details about transaction management regarding Postgres are available here: https://www.postgresql.org/docs/12/plpgsql-transactions.html

ChrisGPT was on strike
  • 127,765
  • 105
  • 273
  • 257
Cristian Balint
  • 299
  • 3
  • 7
  • Note that the procedure must be called from "top level" or another procedure, with no function call in the chain. As the linked docs state: _Transaction control is only possible in CALL or DO invocations from the top level or nested CALL or DO invocations without any other intervening command. _ – petre May 11 '22 at 08:29
  • I think this is available since PG 11? https://www.postgresql.org/docs/11/plpgsql-transactions.html – Rafs Dec 16 '22 at 10:44
9

What you're asking for is generally called an autonomous transaction.

PostgreSQL does not support autonomous transactions at this time (9.4).

To properly support them it really needs stored procedures, not just the user-defined functions it currently supports. It's also very complicated to implement autonomous tx's in PostgreSQL for a variety of internal reasons related to its session and process model.

For now, use dblink as suggested by Bob.

Craig Ringer
  • 307,061
  • 76
  • 688
  • 778
3

For Postgresql 9.5 or newer you can use dynamic background workers provided by pg_background extension. It creates autonomous transaction. Please, refer the github page of the extension. The sollution is better then db_link. There is a complete guide on Autonomous transaction support in PostgreSQL. There is a third way to start autonomous transaction in Postgres, but some patching neede. Please see Peter's Eisentraut patch proposal for OracleDB-style transactions.

shcherbak
  • 738
  • 8
  • 14