83

I have a CSV file with 10 columns. After creating a PostgreSQL table with 4 columns, I want to copy some of 10 columns into the table.

the columns of my CSV table are like:

x1 x2 x3 x4 x5 x6 x7 x8 x9 x10

the columns of my PostgreSQL table should be like:

x2 x5 x7 x10
Braiam
  • 1
  • 11
  • 47
  • 78
POTENZA
  • 1,377
  • 3
  • 17
  • 20

8 Answers8

98

If it is an ad hoc task

Create a temporary table with all the columns in the input file

create temporary table t (x1 integer, ... , x10 text)

Copy from the file into it:

copy t (x1, ... , x10)
from '/path/to/my_file'
with (format csv)

Now insert into the definitive table from the temp:

insert into my_table (x2, x5, x7, x10)
select x2, x5, x7, x10
from t

And drop it:

drop table t

If it is a frequent task

Use the file_fdw extension. As superuser:

create extension file_fdw;

create server my_csv foreign data wrapper file_fdw;

create foreign table my_csv (
    x1 integer,
    x2 text,
    x3 text
) server my_csv
options (filename '/tmp/my_csv.csv', format 'csv' )
;

Grant select permission on the table to the user who will read it:

grant select on table my_csv to the_read_user;

Then whenever necessary read directly from the csv file as if it were a table:

insert into my_table (x2)
select x2
from my_csv
where x1 = 2
Clodoaldo Neto
  • 118,695
  • 26
  • 233
  • 260
  • 10
    Isn't there a cleaner way? – pratnala Aug 15 '13 at 06:44
  • 3
    @pratnala no, there isn't – 1ac0 Nov 16 '14 at 19:38
  • 6
    Wow. PostgresQL fail. Even the Import command in pgadmin3 doesn't let you simply tick off which columns of the CSV file you want to include. Sheesh, this seems like 1980s stuff. – Joe Strout Jan 08 '15 at 16:17
  • 6
    In postgreSQL 9+ use this on copy query `COPY t (x1, ... , x10) FROM '/path/to/my_file' WITH CSV` more on [Postgresql docs](http://www.postgresql.org/docs/9.4/static/sql-copy.html) – juliocesar Oct 21 '15 at 22:17
  • 3
    @JoeStrout: PgAdmin is not Posgresql. It is one of the clients for Postgresql. Use another client of your choice. – Clodoaldo Neto Feb 16 '16 at 11:23
  • I recommend to always use the foreign data wrapper. It's an absolute life saver. I usually load each and every field as text when loading from CSV and then cast the values from the foreign table when needed. – Dennis Bauszus Jul 19 '16 at 19:59
  • Isn't the frequent suggestion the slower one? How often is the text of the CSV converted into Postgresql's internal data representation? Is it converted during every select? And if not: how long is the converted data cached? And what happens with the cache when the CSV file gets changed? – ceving Jan 21 '20 at 13:13
  • 2
    It's not clear to me what @juliocesar's example is trying to do, but according to the linked docs then "WITH CSV" is the pre-9.0 format that is still supported and "with (format csv)" appears to be the 9.0 format. Neither appears to support inserting a subset of columns with just the "COPY" query. – IBBoard Feb 13 '20 at 14:32
  • Does not work if your CSV has more than 1600 columns. – Pak Aug 17 '21 at 16:11
43

You can provide the columns your want to fill with the COPY command. Like so:

\copy your_table (x2,x5,x7,x10) FROM '/path/to/your-file.csv' DELIMITER ',' CSV;

Here's the doc for the COPY command.

Julien
  • 953
  • 9
  • 15
  • 85
    Still, you can't copy less columns than there is in the file. With this command, you would copy the first column of the file to x2, the second to x5, and so on, provided the file has only 4 columns. – Rémi Svahn Jul 24 '15 at 09:20
  • 1
    This is useful for me because the primary key of the table was of type serial, so this column was not using data from the csv file – Paul Taylor Sep 30 '21 at 13:07
  • 2
    If you copy lesser column data than that avail in CSV file, you will get `ERROR: extra data after last expected column` – Makesh Jun 20 '22 at 08:13
26

As other answers have pointed out, it's been possible to specify columns to copy into the PG table. However, without the option to reference column names in the CSV, this had little utility apart from loading into a table where columns had a different order.

Fortunately, as of Postgres 9.3, it's possible to copy columns not only from a file or from standard input, but also from a shell command using PROGRAM:

PROGRAM

A command to execute. In COPY FROM, the input is read from standard output of the command, and in COPY TO, the output is written to the standard input of the command.

Note that the command is invoked by the shell, so if you need to pass any arguments to shell command that come from an untrusted source, you must be careful to strip or escape any special characters that might have a special meaning for the shell. For security reasons, it is best to use a fixed command string, or at least avoid passing any user input in it.

This was the missing piece that we needed for such an eagerly awaited functionality. For example, we could use this option in combination with cut (in a UNIX-based system) to select certain columns by order:

COPY my_table (x2, x5, x7, x10) FROM PROGRAM 'cut -d "," -f 2,5,7,10 /path/to/file.csv' WITH (FORMAT CSV, HEADER)

However, cut has several limitations when manipulating CSV's: it can't adequately manipulate strings with commas (or other delimeters) inside them and doesn't allow to select columns by name.

There are several other open source command-line tools that are better at manipulating CSV files, such as csvkit or miller. Here's an example using miller to select columns by name:

COPY my_table (x2, x5, x7, x10) FROM PROGRAM 'mlr --csv lf cut -f x2,x5,x7,x10 /path/to/file.csv' WITH (FORMAT CSV, HEADER)

Community
  • 1
  • 1
arredond
  • 569
  • 6
  • 8
  • 1
    How did the absence of `PROGRAM` prevent you from using csvkit or Miller to select a subset of columns from the original CSV file, write them to a new CSV file, and then import only those columns into the table using `\copy`? – Derek Mahar Apr 24 '18 at 20:30
  • 6
    @DerekMahar what PROGRAM allows is to perform import operations that involve manipulating the original CSV programatically, ie, all inside Postgres. This is especially useful when used inside a function to automatize the import process. – arredond Apr 26 '18 at 07:24
15

Just arrived here on a pursuit for a solution to only load a subset of columns but apparently it's not possible. So, use awk (or cut) to extract the wanted columns to a new file new_file:

$ awk '{print $2, $5, $7, $10}' file > new_file

and load the new_file. You could pipe the output straight to psql:

$ cut -d \  -f 2,5,7,10 file | 
  psql -h host -U user -c "COPY table(col1,col2,col3,col4) FROM STDIN DELIMITER ' '" database

Notice COPY, not \COPY.

Update:

As it was pointed out in the comments, neither of the above examples can handle quoted delimiters in the data. The same goes for newlines, too, as awk or cut are not CSV aware. Quoted delimiters can be handled with GNU awk, though.

This is a three-column file:

$ cat file
1,"2,3",4

Using GNU awk's FPAT variable we can change the order of the fields (or get a subset of them) even when the quoted fields have field separators in them:

$ gawk 'BEGIN{FPAT="([^,]*)|(\"[^\"]+\")";OFS=","}{print $2,$1,$3}' file
"2,3",1,4

Explained:

$ gawk '
BEGIN {                          # instead of field separator FS
    FPAT="([^,]*)|(\"[^\"]+\")"  # ...  we define field pattern FPAT
    OFS=","                      # output field separator OFS
} 
{
    print $2,$1,$3               # change field order
    # print $2                   # or get a subset of fields
}' file 

Notice that FPAT is GNU awk only. For other awks it's just a regular variable.

James Brown
  • 36,089
  • 7
  • 43
  • 59
  • 2
    cut also supports range, like _cut -d',' -f1-5,7,20 file_ Useful if you have a file with many columns – rags Nov 01 '17 at 07:58
  • 3
    @rags True. Generally I'm used to using awk as it allows printing columns in an arbitrary order. With `psql` it does not matter since you can change the order of table columns in `\COPY` (`-c "\COPY table(col4,col3,col2,col1)..."`). – James Brown Nov 01 '17 at 08:04
  • 1
    like the use of a pipe to handle parsing data before it goes into the database. – Guy Park Nov 12 '19 at 03:10
  • Note: Both `cut` and `awk` *will* *not* give the expected results if your CSV file contains quoted text including commas! Parsing CSVs with simple tools may work, but only for simple cases. – IBBoard Feb 13 '20 at 14:27
  • @IBBoard True. Or if there are newlines etc. as awk or `cut` are not CSV aware. I'll add an example on how to handle quoted commas using GNU awk. – James Brown Feb 13 '20 at 14:57
5

You could take James Brown's suggestion further and do, all in one line:

$ awk -F ',' '{print $2","$5","$7","$10}' file | psql -d db -c "\copy MyTable from STDIN csv header"
James Brown
  • 36,089
  • 7
  • 43
  • 59
Chris Lawton
  • 51
  • 1
  • 1
2

If the number of imported rows is not important for you as result, you could also:

create two tables:

  • t1 (x1 x2 x3 x4 x5 x6 x7 x8 x9 x10):with all the columns of the csv file
  • t2 (x2 x5 x7 x10): as you need it

then create:

  • a trigger function, where you insert the desired columns into t2 instead and return NULL to prevent this row being inserted in t1

  • a trigger for t1 (BEFORE INSERT FOR EACH ROW) that calls this function.

Especially with larger csv files BEFORE INSERT triggers are also useful to filter out rows with certain properties beforehand, and you can do type conversions as well.

Michael Kraxner
  • 121
  • 3
  • 5
1

To load data from spreadsheet (Excel or OpenOffice Calc) into postgreSQL:

Save the spreadsheet page as a CSV file. Prefered method is to open the spreadsheet on OpenOffice Calc and do the saving. On “Export to text file” window choose Character Set as Unicode (UTF8), Field Delimiter: “,” and Text Delimiter “ “ “. Message will be displayed saying only active sheet is saved. Note: This file has to be saved on a folder but not on desktop and have to save in UTF8 format (postgreSQL by dafault is step up for UTF8 encoding). If saved on desktop, postgreSQL will give “access denied” message and won't upload.

In PostgreSQL, create an empty table with same number of column as the spreadsheet.

Note: On each column, column-name has to be same, data type has to be same. Also, keep in mind the length of data where character varying with enough field.

Then on postgreSQL, on SQL window, put the code:

copy "ABC"."def" from E'C:\\tmp\\blabla.csv' delimiters ',' CSV HEADER;

NOTE: Here C:\\tmp is the folder where CSV-file “blabla” is saved. “ABC”.”def” is the table created on postgreSQL where "ABC" is schema and"def" is the actual table. Then do “execute query” by pressing the green button on top. “CSV HEADER” is needed when CSV table has heading at the start of every column.

If everythig is ok, no error message will be displayed and table data from CSV file will be loaded into the postgreSQL table. But if there is an error message do as following:

If error message is saying that the data is too long for a specific column, then increase the column size. This happens mostly on character and character varying column. Then run the “execute query” command again.

If error message is saying that the data type doesn't match to a particular column, then change the data type on postgreSQL table-column to match the one in CSV table.

In your case, after creating CSV file, delete the unwanted columns and match the columns in postgre table.

Sagun
  • 54
  • 2
-7

One quick way is to copy a table to your local directory is to:

\copy (select * from table_name) to 'data.csv' CSV;
Weevils
  • 312
  • 3
  • 9