0

I have a big CSV file with all columns quoted with ". Null values are represented as "", and column separator is |. I would like to use the COPYcommand from postgresql (version 10.7) to ingest those files. I have tried many combination, but the more natural to me is the following:

COPY test.large
FROM '/path/to/big.file'
WITH (
    FORMAT CSV,
    HEADER,
    DELIMITER '|',
    QUOTE '"',
    NULL ''
);

My underlying table expects a column for integers and there are lines where NULL value are set to ...|""|... instead of some number ...|"123456"|.... And unfortunately, this makes COPY to crash, saying:

ERREUR:  syntaxe en entrée invalide pour l'entier : «  »
CONTEXT:  COPY regpat_pct_app_reg, ligne 2743, colonne appid : «  »

Sorry, it is in French from this terminal. Anyway, it says: Invalid syntax for integer : « » At the line 2743 we found:

...000205"|""|"XY...

Which is a NULL value, but I cannot found how to properly setup COPY command switch to make postgresql ingest those files.

The documentation says:

NULL

Specifies the string that represents a null value. The default is \N (backslash-N) in text format, and an unquoted empty string in CSV format. You might prefer an empty string even in text format for cases where you don't want to distinguish nulls from empty strings. This option is not allowed when using binary format.

I am aware that I can clean the file by changing ...|""|... to ...||... using sed or some regexp/replace operations. This will solve my problem, I have checked it.

What I am wondering: is it possible to bear it with postgresql, after all this is perfectly valid CSV format.

Update

Reading the following post as suggested, I wrote:

COPY test.large
FROM '/path/to/big.file'
WITH (
        FORMAT CSV,
        HEADER,
        DELIMITER '|',
        QUOTE '"',
        NULL '',
        FORCE_NULL appid
    );

I get:

ERREUR:  l'argument de l'option « force_null » doit être une liste de noms de colonnes

Which translates to force_null must be a list of column names. Any idea?

jlandercy
  • 7,183
  • 1
  • 39
  • 57

2 Answers2

2

It seems it is a slight inconsistency in the COPY command switches, because the following call:

COPY test.large
FROM '/path/to/big.file'
WITH CSV HEADER DELIMITER '|' QUOTE '"' FORCE NULL appid;

Works as expected or you must add the _ if you use the key value pair notation:

COPY test.large
FROM '/path/to/big.file'
WITH (
    FORMAT CSV,
    HEADER,
    DELIMITER '|',
    QUOTE '"',
    FORCE_NULL (appln_id)
);
jlandercy
  • 7,183
  • 1
  • 39
  • 57
1

This isn't NULL, it's an empty string which is a different thing. I'm not aware of a way to make PG treat an empty string as a null number. I recommend you do your import to a temporary table with a text column instead of an integer here, then move the data to the main table converting to number as appropriate

Take a look at this Q: Postgresql COPY empty string as NULL not work

Very similar to your scenario and the accepted answer there describes a similar technique

Caius Jard
  • 72,509
  • 5
  • 49
  • 80
  • I agree and you'd maybe think that by telling OG that quote is '"' and null is '' that it will first remove the quotes from `,"",...` and find nothing inside them and then understand that it's the same as your null spec ''.. but it doesn't; it find empty string. If you'd said `quote '"' null 'nul'` and your csv was like `...,"nul",nul,...` you might think that they're both null, but one is SQL NULL as per spec and one is a string literal of `nul`. Does it make sense? Think of it like "it looks for null before it removes quotes, not after it removes quotes" – Caius Jard Oct 23 '19 at 18:32
  • I have tried something suggested in your reference. I have updated my post to reflect the error. Funny it requires a list of column names. – jlandercy Oct 23 '19 at 18:43
  • @jlandercy: you need to use `FORCE_NULL (appid)` (i.e a 1-element list). – Marth Oct 23 '19 at 18:47
  • Unfortunetley it does not work, but I have found a solution: `erreur de syntaxe sur ou près de « NULL » LINE 9: FORCE NULL (appid) ^` – jlandercy Oct 23 '19 at 18:54
  • @jlandercy: just to make sure, do you have an underscore between `FORCE` and `NULL`? – Marth Oct 23 '19 at 18:56
  • 1
    Ok I have found why we have this `_` or not discussion, see my answer. Thank you for taking time to solve my problem. – jlandercy Oct 23 '19 at 18:59