10

I'm importing some rows to my postgres database like so:

psql -U postgres import_test < 1432798324_data

Where my import_test is my database and 1432798324_data file is just plain text formatted like:

COPY cars FROM stdin;
<row data>
<row data>
...
\.

COPY drivers FROM stdin;
<row data>
<row data>
...
\.

(I got the format for this plain text file from the answer here).

This method works fine when I'm importing into a blank database. However, if the database isn't blank and during the import any duplicate rows are found I get an error:

ERROR:  duplicate key value violates unique constraint "car_pkey"

Is there any way I could modify my import command to force an overwrite if duplicates are found? In other words, if I'm importing a row and there's already a row with that id, I want my new row to overwrite it.

Community
  • 1
  • 1
Robert
  • 681
  • 1
  • 6
  • 19

2 Answers2

9

You can import into a temporary table. Then you can delete rows that were already there before you copy over the new data:

create temporary table import_drivers as select * from drivers limit 0;

copy import_drivers from stdin;

begin transaction;

delete  from drivers
where   id in
        (
        select  id
        from    import_drivers
        );

insert  into drivers
select  *
from    import_drivers;

commit transaction;
Andomar
  • 232,371
  • 49
  • 380
  • 404
  • So this code goes in my plain text file? I'll basically have a chunk of code like this for each table in my plain text? – Robert May 28 '15 at 08:39
  • And also, is there a more efficient way to do this, since some of these tables have millions of rows? This has to insert all the rows 2 times right? – Robert May 28 '15 at 08:44
  • I would just delete all the indices that need to be overwritten, before importing. – Barney Szabolcs Jun 26 '19 at 17:13
0

One way to deal with this where you are constantly doing a bulk import (lets say daily) is to use table partitioning.

You would just add a time field to your cars and drivers table. The time field is the time when you do the import. Your primary key will have to change for both tables as a two tuple of your existing primary key and the time field.

Once you are done you then just drop older tables (if you are using a daily scheme then you would drop the previous day) or alternatively use max(time_field) in your queries.

Adam Gent
  • 47,843
  • 23
  • 153
  • 203