1

I've a lot of records that are originally from MySQL. I massaged the data so it will be successfully inserted into PostgreSQL using ActiveRecord. This I can easily do with insertions on row basis i.e one row at a time. This is very slow I want to do bulk insert but this fails if any of the rows contains invalid data. Is there anyway I can achieve bulk insert and only the invalid rows failing instead of the whole bulk?

zulqarnain
  • 1,695
  • 1
  • 16
  • 33

2 Answers2

1

COPY

When using SQL COPY for bulk insert (or its equivalent \copy in the psql client), failure is not an option. COPY cannot skip illegal lines. You have to match your input format to the table you import to.

If data itself (not decorators) is violating your table definition, there are ways to make this a lot more tolerant though. For instance: create a temporary staging table with all columns of type text. COPY to it, then fix offending rows with SQL commands before converting to the actual data type and inserting into the actual target table.

Consider this related answer:
How to bulk insert only new rows in PostreSQL

Or this more advanced case:
"ERROR: extra data after last expected column" when using PostgreSQL COPY

If NULL values are offending, remove the NOT NULL constraint from your target table temporarily. Fix the rows after COPY, then reinstate the constraint. Or take the route with the staging table, if you cannot afford to soften your rules temporarily.
Sample code:

ALTER TABLE tbl ALTER COLUMN col DROP NOT NULL;
COPY ...
-- repair, like ..
-- UPDATE tbl SET col = 0 WHERE col IS NULL;
ALTER TABLE tbl ALTER COLUMN col SET NOT NULL;

Or you just fix the source table. COPY tells you the number of the offending line. Use an editor of your preference and fix it, then retry. I like to use vim for that.

INSERT

For an INSERT (like commented) the check for NULL values is trivial:

To skip a row with a NULL value:

INSERT INTO (col1, ...
SELECT col1, ...
WHERE col1 IS NOT NULL

To insert sth. else instead of a NULL value (empty string in my example):

INSERT INTO (col1, ...
SELECT COALESCE(col1, ''),  ...
Community
  • 1
  • 1
Erwin Brandstetter
  • 605,456
  • 145
  • 1,078
  • 1,228
  • Sorry for the late reply. I am not using copy am using insert with multiple values. Also using activerecord-activejdbc. – zulqarnain Jun 10 '13 at 12:30
  • @zulq: I added solutions for `INSERT`. Note that `COPY` is *much* faster with lots of rows. – Erwin Brandstetter Jun 10 '13 at 15:10
  • I am not doing insert into (col1,...) select. I'm using only insert into tablename(col,col2,col...) VALUES(set of values),(another set ofvalues),(another set of values) etc. – zulqarnain Jun 11 '13 at 09:39
0

A common work-around for this is to import the data into a TEMPORARY or UNLOGGED table with no constraints and, where data in the input is sufficiently bogus, text typed columns.

You can then do INSERT INTO ... SELECT queries against the data to populate the real table with a big query that cleans up the data during import. You can use a lot of CASE statements for this. The idea is to transform the data in one pass.

You might be able to do many of the fixes in Ruby as you read the data in, then push the data to PostgreSQL using COPY ... FROM STDIN. This is possible with Ruby's Pg gem, see eg https://bitbucket.org/ged/ruby-pg/src/tip/sample/copyfrom.rb .

For more complicated cases, look at Pentaho Kettle or Talend Studio ETL tools.

Craig Ringer
  • 307,061
  • 76
  • 688
  • 778
  • It would be cool if Postgres had something similar as Oracle's `LOG ERRORS INTO...` clause –  Jun 10 '13 at 15:31
  • @a_horse_with_no_name Yes, it would. Because of the way Pg works internally that'd require a `SAVEPOINT` for every row, though, and that'd be pretty horrifying from a performance point of view, plus it'd be a great way to get to xid wrap-around prevention vacuum in a huge hurry. Though if you disallow `AFTER` triggers it might work... – Craig Ringer Jun 11 '13 at 01:04