97

I'm using Postgres and would like to make a big update query that would pick up from a CSV file, lets say I got a table that's got (id, banana, apple).

I'd like to run an update that changes the Bananas and not the Apples, each new Banana and their ID would be in a CSV file.

I tried looking at the Postgres site but the examples are killing me.

Erwin Brandstetter
  • 605,456
  • 145
  • 1,078
  • 1,228
user519753
  • 1,527
  • 6
  • 18
  • 22
  • You are not trying to do that from within pgadmin3, are you ? You probably need a scripting language of some sort (e.g. Python, ...) You also need to clarify what you mean by "update". My wild guess is that your CSV file contains items that may or may not be in the DB, and you must either INSERT them or UPDATE them - only if they are Bananas. But, please clarify. –  Jan 18 '12 at 13:06

3 Answers3

215

COPY the file to a temporary staging table and update the actual table from there. Like:

CREATE TEMP TABLE tmp_x (id int, apple text, banana text); -- but see below

COPY tmp_x FROM '/absolute/path/to/file' (FORMAT csv);

UPDATE tbl
SET    banana = tmp_x.banana
FROM   tmp_x
WHERE  tbl.id = tmp_x.id;

DROP TABLE tmp_x; -- else it is dropped at end of session automatically

If the imported table matches the table to be updated exactly, this may be convenient:

CREATE TEMP TABLE tmp_x AS SELECT * FROM tbl LIMIT 0;

Creates an empty temporary table matching the structure of the existing table, without constraints.

Privileges

Up to Postgres 10, SQL COPY requires superuser privileges for this.
In Postgres 11 or later, there are also some predefined roles (formerly "default roles") to allow it. The manual:

COPY naming a file or command is only allowed to database superusers or users who are granted one of the roles pg_read_server_files, pg_write_server_files, or pg_execute_server_program [...]

The psql meta-command \copy works for any db role. The manual:

Performs a frontend (client) copy. This is an operation that runs an SQL COPY command, but instead of the server reading or writing the specified file, psql reads or writes the file and routes the data between the server and the local file system. This means that file accessibility and privileges are those of the local user, not the server, and no SQL superuser privileges are required.

The scope of temporary tables is limited to a single session of a single role, so the above has to be executed in the same psql session:

CREATE TEMP TABLE ...;
\copy tmp_x FROM '/absolute/path/to/file' (FORMAT csv);
UPDATE ...;

If you are scripting this in a bash command, be sure to wrap it all in a single psql call. Like:

echo 'CREATE TEMP TABLE tmp_x ...; \copy tmp_x FROM ...; UPDATE ...;' | psql

Normally, you need the meta-command \\ to switch between psql meta commands and SQL commands in psql, but \copy is an exception to this rule. The manual again:

special parsing rules apply to the \copy meta-command. Unlike most other meta-commands, the entire remainder of the line is always taken to be the arguments of \copy, and neither variable interpolation nor backquote expansion are performed in the arguments.

Big tables

If the import-table is big it may pay to increase temp_buffers temporarily for the session (first thing in the session):

SET temp_buffers = '500MB';  -- example value

Add an index to the temporary table:

CREATE INDEX tmp_x_id_idx ON tmp_x(id);

And run ANALYZE manually, since temporary tables are not covered by autovacuum / auto-analyze.

ANALYZE tmp_x;

Related answers:

Erwin Brandstetter
  • 605,456
  • 145
  • 1,078
  • 1,228
  • Yup, nice one. I'm always leaning towards the huge machinery when things can sometimes be made so simple. –  Jan 18 '12 at 13:27
  • @user519753: Just learned a new term - and from what I see on the internets a "thank you!" is in order. :) – Erwin Brandstetter Jan 18 '12 at 15:18
  • 4
    `COPY tmp_x FROM '/absolute/path/to/file' (DELIMITER ';', HEADER TRUE, FORMAT CSV)` worked better for me. See (http://www.postgresql.org/docs/9.1/static/sql-copy.html) – taper Jan 28 '13 at 09:14
  • 1
    @taper: I normally run COPY without any parameters. But the question is about *CSV* as you may have noticed. – Erwin Brandstetter Jan 28 '13 at 12:44
  • 1
    this only worked for me (Postgres 9.3) after replacing `USING` with `FROM` in the `UPDATE`-statement – artm Sep 04 '14 at 08:45
  • @artm: Thanks, fixed. Confusingly, it's `USING` for `DELETE` to join in tables (since the `FROM` keyword is already in use there). – Erwin Brandstetter Sep 04 '14 at 17:30
  • for postgres 9.3 keyword `format` should be replaced with keyword `with`. – Andremoniy Nov 28 '14 at 09:32
  • @Andremoniy: The keyword `FORMAT` is unchanged, `WITH` is an optional (unrelated) noise word. But I added parentheses according to modern syntax, thanks for the hint. – Erwin Brandstetter Nov 28 '14 at 15:32
  • In Amazon Web Services' RDS, they say to use `target-db=> \copy source-table from 'source-table.csv' with DELIMITER ',';` for the import step - not the backslash before the copy. See [this page](https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/PostgreSQL.Procedural.Importing.html) – egeland Apr 26 '15 at 15:19
  • @egeland: `\copy` is the psql meta-command that encapsualtes the SQL command `COPY` in the client: http://stackoverflow.com/questions/8119297/postgresql-export-resulting-data-from-sql-query-to-excel-csv/8119342#8119342 – Erwin Brandstetter Apr 26 '15 at 18:58
  • `500MB` should be single quoted. – malthe Sep 02 '15 at 13:58
  • @malthe: Thanks, applied. – Erwin Brandstetter Sep 02 '15 at 14:25
  • Basically, to update a column in a table with updated values from an external file: `CREATE TEMP TABLE temp_table ...; UPDATE original_table SET original_table.col = temp_table.col FROM temp_table WHERE original_table.id = temp_table.id;`. Chose another matching column (other than `id`) -- if available -- if you don't want to mess with your `id` values. – Victoria Stuart Jun 27 '19 at 21:47
  • what if CSV file has records which needs to be inserted in table then? – Govind Gupta Sep 09 '19 at 15:01
  • Is it possible to pass `$pwd` in the `\copy` command, to generate the first part of the absolute path dynamically? I saw other answers like https://stackoverflow.com/a/33271507/5031446 that include it, but it doesn't work for me (using PostgreSQL 12.5 and psql 13.0) – Xoel Dec 15 '20 at 09:59
3

I was having the same problem. But in this solution I have found some difficulties. Since I was not superuser using copy gives error. So I have found an alternative solution for my problem.

I am using postgresql and pgadmin4. here is the solution I came with.

  1. Create a new table and copy the fruits table to the new table.
CREATE TABLE fruits_copy AS TABLE fruits WITH NO DATA;
  1. Import the CSV file data to the new table(fruits_copy). I am using pgadmin4,so here is the how to import details. (It may differ).

  2. Update the fruits table from fruits_copy table.

UPDATE fruits SET banana = fruits_copy.banana FROM fruits_copy WHERE fruits.id = fruits_copy.id;

  1. After that if you want to delete the new table, you can just drop it.

DROP TABLE fruits_copy;

-1

You can try the below code written in python, the input file is the csv file whose contents you want to update into the table. Each row is split based on comma so for each row, row[0]is the value under first column, row[1] is value under second column etc.

    import csv
    import xlrd
    import os
    import psycopg2
    import django
    from yourapp import settings
    django.setup()
    from yourapp import models


    try:
       conn = psycopg2.connect("host=localhost dbname=prodmealsdb 
       user=postgres password=blank")
       cur = conn.cursor()

       filepath = '/path/to/your/data_to_be_updated.csv'
       ext = os.path.splitext(filepath)[-1].lower()
       if (ext == '.csv'): 
          with open(filepath) as csvfile:
          next(csvfile)
          readCSV = csv.reader(csvfile, delimiter=',')
          for row in readCSV:
              print(row[3],row[5])
              cur.execute("UPDATE your_table SET column_to_be_updated = %s where 
              id = %s", (row[5], row[3]))
              conn.commit()
          conn.close()
          cur.close()

    except (Exception, psycopg2.DatabaseError) as error:
    print(error)
    finally:
    if conn is not None:
      conn.close()