0

I have two tables. One of them is a temporary table in which I copy the data from a big CSV file. After that I update my other table with the temporary table (see this answer: Copy a few of the columns of a csv file into a table).

When I update my temporary table once more with a (updated) CSV file (data from a grep in bash, increasing row numbers per update) I want to delete the rows that are not affected by the update. I could have a temp table smaller than a temp table with all the data.

First: Is it better drop all data in the temp table and to fill it with the whole updated CSV data and after that to update/insert the other table. Second: Or to update the temp table in the first place?

So it is a matter of size of the tables. I talking about 500k rows (with geometry columns).

An example:

table
1, NULL
2, NULL

temp table
1, hello
2, good morning

CSV
1, hello there
2, good morning
3, good evening

temp table
1, hello there
2, good morning
3, good evening

OR

temp table
1, hello there
3, good evening

So my question is how to update a table with a CSV file, insert new rows, update the old rows and delete the rows that were not affected by the update.

Community
  • 1
  • 1
Stefan
  • 1,383
  • 2
  • 15
  • 25

2 Answers2

4

So my question is how to update a table with a CSV file, insert new rows, update the old rows and delete the rows that were not affected by the update.

I see two possible solutions:

1. Apply the changes individually

The data is applied with a series of update/delete/insert statements like this:

-- get rid of deleted rows
delete from the_table
where not exists (select 1
                  from temp_table tt
                  where tt.id = the_table.id);

-- update changed data
update the_table 
   set ..
from temp_table src
where src.id = the_table.id;


-- insert new rows
insert into the_table
select ..
from temp_table src
where not exists (select 1
                  from the_table t2
                  where t2.id = src.id);

This is the required approach if other sources write to the target table and you don't want to overwrite that. Maybe you don't even want to delete "missing" rows then. Or update only a sub-set of the columns.

2. Flush and fill the table

If you never modify the data in the target table and you don't have foreign keys referencing that table, I would do a flush and fill of the real table:

truncate the_table;
copy the_table from '/path/to/data.csv' ...;

If you run the truncate and copy in a single transaction, the copy performance will improve because it minimizes the amount of WAL logging.

0

I haven't many experience with SQL(half year), but maybe you will compare your table using MINUS clause? Using MINUS you can get non-updated rows? P.S. I'm talking about PL/SQL)

  • 1
    `minus` is an Oracle operator. In standard SQL (and Postgres) this is called `except`. Plus. `MINUS` is **not** PL/SQL, that's SQL. –  Oct 14 '14 at 07:58
  • Sure, when I said PL/SQL, I mean Oracle SQL(MINUS) – grossoghur Oct 14 '14 at 09:48