How to bulk Insert in Postgres ignoring all errors that may occur in the process?

Question

I have to insert a good amount of log records every hour in a table and I would not care about Integrity Errors or Violations that happen in the process.

If I disable autoCommit and do a bulk insert, cursor wont insert anything beyond the row where the transaction failed. Is there a way around this one ?

One hack is to handle this at the application level. I could implement a n-sized buffer and do bulk inserts. If something failed in that transaction, recursively repeat the insert for buffer_first_half + buffer_second_half

def insert(buffer):
    try:
        bulk_insert(buffer)
    except:
        connection.rollback()

        marker = len(buffer)/2

        insert(buffer[:marker])
        insert(buffer[marker:])

But I really hope if it could be achieved using any Postgres' built-in ?

Did you search this site before asking this? It's a real FAQ. — Craig Ringer, Jan 11 '14 at 03:08
Actually I did. If you may like I can share the links of the questions I deem were similar. Mostly, someone would end up answering it with a "do not make an update on this column." "Change your fkey".. . I am plainly looking for a generic answer. Which btw you have provided. Thank you :-) — meson10, Jan 11 '14 at 03:47
No worries. It's worth mentioning what you tried / looked into first, that's all. Sometimes it'll save you from getting the same things suggested to you all over again. — Craig Ringer, Jan 11 '14 at 03:53
Well, if it fits in a (python?) buffer, it cannot be bulk. BTW: you don't need to rollback on error. An error implies a rollback. — wildplasser, Jan 11 '14 at 13:33
That code was more convenience sake :-) I was anyway going to send batches of values. Say, 10K rows at a time. — meson10, Jan 11 '14 at 15:22

score 2 · Accepted Answer · edited May 23 '17 at 12:22

2

PostgreSQL doesn't provide anything built-in for this. You can use SAVEPOINTs, but they're not much better than individual transactions.

Treat each insert as an individual transaction and work on making those tx's faster:

SET synchronous_commit = off in your session
INSERT into an UNLOGGED table, then INSERT INTO ... SELECT the results into the real table after checking

Here's an earlier closely related answer which also links to more information. I haven't marked this as a duplicate because the other one is specific to upsert-like data loading, you're interested in more general error handling.

edited May 23 '17 at 12:22

Community

1
1

answered Jan 11 '14 at 03:10

Craig Ringer

307,061
76
688
778

I think I am still left with a doubt.Say I dump the data into a collection without any checks/conditions, the final routine which weeds out corrupt data and inserts into production table would be triggered by the application or a Cron ? If so, How would it be different from weeding out unwanted data at Stage 1 itself and inserting directly into production table ? – meson10 Jan 13 '14 at 03:13
@meson10 Because you can do the weeding in a big set operation, you don't have to do it one by one in individual transactions. Operations on sets are generally significantly faster. – Craig Ringer Jan 13 '14 at 03:33

How to bulk Insert in Postgres ignoring all errors that may occur in the process?

1 Answers1