39

I have a utility in my application where i need to perform bulk load of INSERT, UPDATE & DELETE operations. I am trying to create transaction around this so that once this system is invoke and the data is fed to it, it is ensured that it is either all or none added to the database.

The concern what is have is what is the boundary conditions here? How many INSERT, UPDATE & DELETE can i have in one transaction? Is transaction size configurable?

phil pirozhkov
  • 4,740
  • 2
  • 33
  • 40
Salman A. Kagzi
  • 3,833
  • 13
  • 45
  • 64

4 Answers4

33

I don't think there's a maximum amount of work that can be performed in a transaction. Data keeps getting added to the table files, and eventually the transaction either commits or rolls backs: AIUI this result gets stored in pg_clog; if it rolls back, the space will eventually be reclaimed by vacuum. So it's not as if the ongoing transaction work is held in memory and flushed at commit time, for instance.

araqnid
  • 127,052
  • 24
  • 157
  • 134
  • 19
    This is only partially correct. Inside each transaction is a commandcounter that deals with visibility inside the transaction. This is a 32-bit number that will eventually overflow if you have a very large transaction (billions of commands). VACUUM, pg_clog etc only deals with total number of transactions in the system, not what happens inside one of them. – Magnus Hagander Aug 28 '09 at 09:11
  • 1
    @MagnusHagander Does this 32-bit number still hold true? Has this number been updated to 64-bit? – Kuberchaun Jan 25 '13 at 14:55
  • @MagnusHagander if I understand your clarification correctly, pg_clog prior to v10 (and pg_xact now) only contains transaction metadata like the command counter, and the actual data still ends up in pg_xlog/pg_wal regardless of whether the transaction has been committed yet. This is what I am seeing in my quick tests. The answer left me under the impression that data is moved to the wal files only after commit – atomic77 Jul 19 '18 at 14:26
  • so in my `.sql` file I can do something like:- `begin; lots of insert sql statements (around 2 million insert statements); commit;` – firstpostcommenter Dec 03 '21 at 14:26
27

A single transaction can run approximately two billion commands in it (2^31, minus IIRC a tiny bit of overhead. Actually, come to think of it, that may be 2^32 - the commandcounter is unsigned I think).

Each of those commands can modify multiple rows, of course.

Magnus Hagander
  • 23,890
  • 5
  • 56
  • 43
22

For a project I work on, I perform 20 millions of INSERT. I tried with one big transaction and with one transaction for every million of INSERT and the performances seem exactly the same.

PostgreSQL 8.3

bortzmeyer
  • 34,164
  • 12
  • 67
  • 91
  • 1
    Was the system local? I think doing this on a system where latency is a factor, the performance would differ. – user2677679 Feb 03 '17 at 07:34
  • 1
    The performance of your program does not differ. But how about the performance of the other users? – ceving Aug 17 '20 at 12:55
-2

I believe the maximum amount of work is limited by your log file size. The database will never allow itself to not be able to rollback, so if you consume all your log space during the transaction, it will halt until you give it more space or rollback. This is a generally true for all databases.

I would recommend chunking your updates into manageable chunks that take a most a couple of minutes of execution time, that way you know if there's a problem earlier (eg what normally takes 1 minute is still running after 10 minutes... hmmm, did someone drop an index?)

  • 10
    This is not true for PostgreSQL. We can recycle log space during a running transaction. If you are doing archive logging, you will obviously need space in the archive location, but for the local transaction log it's not necessary. (you will need the actual disk space for the data on disk, of course). – Magnus Hagander Aug 28 '09 at 09:12