I am working with an external data source (regulatory database) in form of a large CSV file (several GiB of data, ~200M rows), which I naturally import as is into a separate PostgreSQL table to run queries against it:
CREATE TABLE foo (id serial primary key, bar integer, baz integer, UNIQUE (bar, baz));
COPY foo (bar, baz) FROM 'input.csv' WITH (FORMAT CSV, HEADER TRUE);
This file is updated every week or so, with a very small number of rows actually being changed (added or deleted).
Is there a way to synchronize the table with the updated CSV that will be more efficient than redoing the above and then atomically renaming the table?