22

I'm trying to import a CSV file into PostgreSQL using COPY. It chokes when it hits a row where there are empty values, e.g. the second row below:

JAN-01-2001,1,2,3,4,5

JAN-02-2001,6,7,,,

I've tried this COPY statement, as well as variants using NULL and QUOTE and havent found anything that works.

COPY data FROM 'data.dat' USING DELIMITERS ',' CSV;

Any suggestions? The data file is in a massive 22GB flat file, so I'd like to avoid editing it directly.

ugh
  • 769
  • 2
  • 6
  • 6
  • 1
    That's not full story. According to [documentation](http://www.postgresql.org/docs/8.3/static/sql-copy.html) and my check above case works well under any modern PostgreSQL (empty value with no quotes in CSV mode). Maybe you have NOT NULL modifier or invalid number of commas ? – Grzegorz Szpetkowski Dec 01 '11 at 23:43
  • ERROR: invalid input syntax for type numeric: CONTEXT: COPY data, line 13, column interval_2400: – ugh Dec 02 '11 at 19:43
  • 1
    If I edit the file and read in only the first 12 lines, the import works just fine. It only chokes when it gets to the 13th line, which is missing data values and looks like my example above. – ugh Dec 03 '11 at 03:56
  • I'm having the same problem and still didn't found an answer. Maybe I'll have to transform my data before importing it or don't use the "Copy from" functionality. It's a shame. – Iúri dos Anjos Mar 24 '19 at 22:23
  • Well, I finally did find a way out, but using python with psycopg2 library. Command to copy (you need to get a connect first and open a cursor): "cursor.copy_from(f, 'aircrafts', sep=',', null='')" – Iúri dos Anjos Mar 24 '19 at 22:26

6 Answers6

1

I would suggest converting your numeric columns to text columns for the purposes of your import. The reason is that an empty string is not a valid numeric value. Change your numeric columns to text columns, import the CSV file, update the empty values to null or 0, and then change the column back to an integer.

Kenaniah
  • 5,171
  • 24
  • 27
  • 4
    In CSV format, nothing in between delimiters signifies `NULL`, not an empty string, which is valid syntax for a numeric value. Empty strings are written as empty double quotes: `""`. – Erwin Brandstetter Dec 03 '11 at 00:03
  • If anyone like me gets confused by Erwin's helpful comment about empty strings signifying blank in CSV files -- I was using a text (TSV) file...it's a different symbol there for blank values: '\N'. – Sigfried Nov 23 '15 at 13:17
1

Your statement is suspicious:

COPY data FROM 'data.dat' USING DELIMITERS ',' CSV;

DELIMITERS was used in versions before 7.3. It is still supported in order not to break old code, but don't use it any more. The proper keyword is DELIMITER. And you don't need to specify , at all as it is the default for FORMAT CSV.
Also, I quote the manual here:

filename

The absolute path name of the input or output file. Windows users might need to use an E'' string and double any backslashes used in the path name.

Bold emphasis mine. Replace 'data.dat' with something like '/path/to/data.dat' on UNIX or E'C:\\path\\to\\data.dat' on Windows.

For versions 7.3+ use:

COPY data FROM '/path/to/data.dat' CSV

For versions 9.0+ use:

COPY data FROM '/path/to/data.dat' (FORMAT CSV)

If you still get this error:

ERROR: invalid input syntax for type numeric:
CONTEXT: COPY data, line 13, column interval_2400:

Then, obviously, the source file does not match the structure of table data. Have a look at your source file, go to line 13 and see what value is there for column interval_2400. Chances are, it's not numeric. In particular, an empty string ('') is not allowed in columns of numeric type.

You can either fix the source file or adapt the table definition:

ALTER TABLE data ALTER COLUMN interval_2400 TYPE text;

Or whatever type is more appropriate. Might be interval, judging from the name. (But text accepts almost any input values.)

Or, better yet, create a modified temporary file, COPY to it, fix offending values, then INSERT into the target table, casting from text. See:

Erwin Brandstetter
  • 605,456
  • 145
  • 1,078
  • 1,228
  • 1
    PostgreSQL 9.1.1, MacOSX 10.7.2. As I noted above, the import works just fine for the first 12 lines of the file. The columns are in fact numeric and match the table definition. It's just that some row have missing values. – ugh Dec 03 '11 at 03:53
  • @user1076175: Please edit your question with the first 15 rows of your data file. Also put the version information in your question (not in a comment to my answer). – Erwin Brandstetter Dec 03 '11 at 04:30
  • The main problem is that of corrupt csv of course. Right now I have a 50GB big csv which on every couple 1000 lines has a bad line with a field missing. So the question is could we tell postgresql to just go over the lines that throw error and continue the rest. – Smalcat Feb 19 '13 at 12:15
  • To my knowledge, that's not possible. You'll have to fix your CSV file or adapt the table definition like I outlined - possible COPY to a temporary table and insert into the main table after fixing incompatibilities. – Erwin Brandstetter Feb 19 '13 at 13:48
0

This is PostgreSQL bug - csv parser ignore last empty item and throw error - "PG::BadCopyFileFormat: ERROR: missing data for column".

i'm use a stupid hack:

If last item is empty, simple add a one delimiter to end of string:

1,2,3
1,2,,

This add missed last item in row to import data.

Sergio Belevskij
  • 2,478
  • 25
  • 24
0

One additional caveat- Check the line number of the error and make sure it is not a blank row in the CSV file. That will cause postgres to throw the same error about missing values.

111
  • 1,788
  • 1
  • 23
  • 38
0

You can do it this way:

COPY productos FROM 'path/to/csv/productos.csv' 
WITH DELIMITER ',' CSV HEADER FORCE NULL max_time, warn_time, 
time_displayed, id_departamento ENCODING 'ISO 8859-1';

This parameter FORCE NULL (comma-separated fieldlist) converts empty cells to null values into the fieldlist fields. Note that I also had to specify file encoding to make it work.

lemon
  • 14,875
  • 6
  • 18
  • 38
Caesar
  • 1
-1

Anyone coming here with smaller files: here's the easiest fix I've found to this and inconsistent number of delimiters in a csv.

  1. Open your CSV.
  2. Ctrl + Shift + 8 (selects all data)
  3. Ctrl + h (opens find replace)
  4. Leave the find box blank so its looking for strings of 0 length.
  5. Enter a space in the replace with box.

This will loop through the whole CSV and force it to have the correct column count in delimiters ( , ) even if theres no data in that column.

If you're alright with Excel you can turn this into a macro too, so my macro (Ctrl + g) does this in one go. Creating a Macro

IKavanagh
  • 6,089
  • 11
  • 42
  • 47
Lewis
  • 624
  • 9
  • 16