6

I have to insert a good amount of log records every hour in a table and I would not care about Integrity Errors or Violations that happen in the process.

If I disable autoCommit and do a bulk insert, cursor wont insert anything beyond the row where the transaction failed. Is there a way around this one ?

One hack is to handle this at the application level. I could implement a n-sized buffer and do bulk inserts. If something failed in that transaction, recursively repeat the insert for buffer_first_half + buffer_second_half

def insert(buffer):
    try:
        bulk_insert(buffer)
    except:
        connection.rollback()

        marker = len(buffer)/2

        insert(buffer[:marker])
        insert(buffer[marker:])

But I really hope if it could be achieved using any Postgres' built-in ?

meson10
  • 1,934
  • 14
  • 21
  • Did you search this site before asking this? It's a real FAQ. – Craig Ringer Jan 11 '14 at 03:08
  • Actually I did. If you may like I can share the links of the questions I deem were similar. Mostly, someone would end up answering it with a "do not make an update on this column." "Change your fkey".. . I am plainly looking for a generic answer. Which btw you have provided. Thank you :-) – meson10 Jan 11 '14 at 03:47
  • No worries. It's worth mentioning what you tried / looked into first, that's all. Sometimes it'll save you from getting the same things suggested to you all over again. – Craig Ringer Jan 11 '14 at 03:53
  • Well, if it fits in a (python?) buffer, it cannot be bulk. BTW: you don't need to rollback on error. An error implies a rollback. – wildplasser Jan 11 '14 at 13:33
  • That code was more convenience sake :-) I was anyway going to send batches of values. Say, 10K rows at a time. – meson10 Jan 11 '14 at 15:22

1 Answers1

2

PostgreSQL doesn't provide anything built-in for this. You can use SAVEPOINTs, but they're not much better than individual transactions.

Treat each insert as an individual transaction and work on making those tx's faster:

  • SET synchronous_commit = off in your session
  • INSERT into an UNLOGGED table, then INSERT INTO ... SELECT the results into the real table after checking

Here's an earlier closely related answer which also links to more information. I haven't marked this as a duplicate because the other one is specific to upsert-like data loading, you're interested in more general error handling.

Community
  • 1
  • 1
Craig Ringer
  • 307,061
  • 76
  • 688
  • 778
  • I think I am still left with a doubt.Say I dump the data into a collection without any checks/conditions, the final routine which weeds out corrupt data and inserts into production table would be triggered by the application or a Cron ? If so, How would it be different from weeding out unwanted data at Stage 1 itself and inserting directly into production table ? – meson10 Jan 13 '14 at 03:13
  • @meson10 Because you can do the weeding in a big set operation, you don't have to do it one by one in individual transactions. Operations on sets are generally significantly faster. – Craig Ringer Jan 13 '14 at 03:33