0

My SQL script looks like below

create table "cad".yield_name
(
    yield_id     int primary key,
    yield_name   varchar,
    yield_name_f varchar
);

copy cad.yield_name
    from '/data/cad/base/YIELD\ NAME.csv'
    delimiter ',' csv header;

As you see, I am importing a file called '/data/cad/base/YIELD\ NAME.csv', which is available in the directory when I do ls -ltr.

total 30724
-rw-r--r--    1 postgres postgres    578264 Aug 31  2015 CONVERSION FACTOR.csv
-rw-r--r--    1 postgres postgres      1155 Aug 31  2015 FOOD GROUP.csv
-rw-r--r--    1 postgres postgres    761846 Aug 31  2015 FOOD NAME.csv
-rw-r--r--    1 postgres postgres      2850 Aug 31  2015 FOOD SOURCE.csv
-rw-r--r--    1 postgres postgres     12740 Aug 31  2015 NUTRIENT NAME.csv
-rw-r--r--    1 postgres postgres      2229 Aug 31  2015 NUTRIENT SOURCE.csv
-rw-r--r--    1 postgres postgres    162305 Aug 31  2015 REFUSE AMOUNT.csv
-rw-r--r--    1 postgres postgres   4196317 Aug 31  2015 REFUSE NAME.csv
-rw-r--r--    1 postgres postgres   4193826 Aug 31  2015 YIELD NAME.csv
-rw-r--r--    1 postgres postgres   1665394 Sep 21  2015 CNF 2015 users_guide EN.pdf
-rw-r--r--    1 postgres postgres   1637393 Sep 21  2015 CNF 2015 users_guide FR.pdf
-rw-r--r--    1 postgres postgres    893270 Sep 21  2015 CNF 2015 Database structure EN .pdf
-rw-r--r--    1 postgres postgres    972920 Sep 21  2015 CNF 2015 database structure FR .pdf
-rw-r--r--    1 postgres postgres     50521 Oct  6  2015 MEASURE NAME.csv
-rw-r--r--    1 postgres postgres     40265 Oct  9  2015 YIELD AMOUNT.csv
-rw-r--r--    1 postgres postgres  16257583 Oct 30  2015 NUTRIENT AMOUNT.csv

However, when I try to import, I get the following error on the command-line

2021-11-30 00:24:49.736 UTC [55] STATEMENT:  copy cad.yield_name
        from '/data/cad/base/YIELD\ NAME.csv'
        delimiter ',' csv header;
psql:sql/postgres/cad/base/1_yield_name.sql:10: ERROR:  could not open file "/data/cad/base/YIELD\ NAME.csv" for reading: No such file or directory
HINT:  COPY FROM instructs the PostgreSQL server process to read a file. You may want a client-side facility such as psql's \copy.

I am using PostgreSQL 14.1 docker image on Mac OSX. Could someone help me understand how can I import the file name with space using copy command?

Thank you

daydreamer
  • 87,243
  • 191
  • 450
  • 722
  • 1
    The backslash to escape the space is a thing done in *nix shells, not inside all other programs - and ISO SQL doesn't use backslashes for escape-sequences. Have you tried it without the backslash? i.e. `from '/data/cad/base/YIELD NAME.csv'` – Dai Nov 30 '21 at 00:38
  • @Dai, thank, when I tried with `/data/cad/base/YIELD NAME.csv`, I see `psql:sql/postgres/cad/base/1_yield_name.sql:10: ERROR: invalid byte sequence for encoding "UTF8": 0xe9 0x20 0x72 ` – daydreamer Nov 30 '21 at 00:44
  • 1
    Where did these CSV files come from? Do you know what encoding they're in? Maybe ISO-8859-1 or Windows-1252? – mu is too short Nov 30 '21 at 00:55

1 Answers1

2
  • Problem 1: Escaping spaces in filenames.

    • While using a backslash '\' to escape spaces ' ' is something done on Unix/Linux shells when filenames aren't enquoted, it isn't necessary in SQL.

      • It isn't necessary inside most (all?) non-shell-scripting program code as strings are always delimited (looking at you, PowerShell)
      • Also, I note that MySQL and PostgreSQL both support backslash escapes, neither use backslash escapes for space characters.
        • PostgreSQL only recognizes backslash escape sequences in E''-style strings, btw.
    • Anyway, simply remove the backslash:

      COPY cad.yield_name
           FROM '/data/cad/base/YIELD  NAME.csv'
           DELIMITER ',' csv header;
      
  • Problem 2: File encoding

    • In a comment reply, you wrote that you saw this error:

      ERROR: invalid byte sequence for encoding "UTF8": 0xE9 0x20 0x72

    • PostgreSQL assumes the CSV file uses UTF-8 encoding by default, however the character sequence 0xE9 0x20 0x72 strongly hints the file has ISO-8859-1 encoding.

      • This encoding is kinda obsolete now - you should prompt whoever made it to use UTF-8 instead.
    • So add the ENCODING parameter:

      COPY cad.yield_name
           FROM '/data/cad/base/YIELD  NAME.csv'
           DELIMITER ',' csv header ENCODING 'ISO 8859-1';
      
Dai
  • 141,631
  • 28
  • 261
  • 374
  • THANK YOU very much for the detailed answer. It did solve my problem and also educated me on the details. To answer the data source, it is published by Canadian Government, where it is difficult to ask them to change, I would, however, send them a request. Again, THANK YOU VERY MUCH! – daydreamer Dec 01 '21 at 16:18