3

Is there a way to COPY the CSV file data directly into a JSON or JSONb array?

Example:

CREATE TABLE mytable (
    id serial PRIMARY KEY,
    info jSONb -- or JSON
);
COPY mytable(info) FROM '/tmp/myfile.csv' HEADER csv;

NOTE: each CSV line is mapped to a JSON array. It is a normal CSV.


Normal CSV (no JSON-embeded)... /tmp/myfile.csv =

a,b,c
100,Mum,Dad
200,Hello,Bye

The correct COPY command must be equivalent to the usual copy bellow.

Usual COPY (ugly but works fine)

CREATE TEMPORARY TABLE temp1 (
  a int, b text, c text
);
COPY temp1(a,b,c) FROM '/tmp/myfile.csv' HEADER csv;

INSERT INTO mytable(info) SELECT json_build_array(a,b,c) FROM temp1;

It is ugly because:

  • need the a priory knowledge about fields, and a previous CREATE TABLE with it.

  • for "big data" need a big temporary table, so lost CPU, disk and my time — the table mytable have CHECKs and UNIQUEs constraints for each line.

  • ... Needs more than 1 SQL command.

Peter Krauss
  • 13,174
  • 24
  • 167
  • 304
  • See http://stackoverflow.com/questions/10824165/converting-a-csv-file-to-json-using-c-sharp – Jim Moore Jan 04 '17 at 14:14
  • Show us the content of the file. But if it contains valid [JSON literals](https://www.postgresql.org/docs/current/static/datatype-json.html#JSON-KEYS-ELEMENTS) then this should work. What happened when you tried? –  Jan 04 '17 at 14:32
  • @a_horse_with_no_name thanks. Well, it works, but not in an elegant way (I am using "big data"), needs some steps and a temporary table... I edited, see example. – Peter Krauss Jan 04 '17 at 15:21
  • Thanks @JimMoore, yes, using an external program we can do anything... My question is about "how to" with PostgreSQL 9.6+. – Peter Krauss Jan 04 '17 at 15:22
  • You can avoid insert into table using file_fdw foreign data wrapper and read data directly from the file. Although it still requires previous knowledge of columns. – JosMac Jan 04 '17 at 16:17
  • @JosMac, I think it is will be a good answer! The "temporary table" is the big problem... Elegancy and the [*a priory* knowledge](http://frictionlessdata.io/guides/json-table-schema/) I can overlook ;-) – Peter Krauss Jan 04 '17 at 16:41
  • here is an example - http://postgresql.freeideas.cz/file_fdw-extension-small-hints/ – JosMac Jan 04 '17 at 16:51

1 Answers1

5

Perfect solution!

Not need to know all the CSV columns, only extract what you know.

Use at SQL CREATE EXTENSION PLpythonU;: if the command produce an error like "could not open extension control file ... No such file" you need to install pg-py extra-packages. In standard UBUNTU (16 LTS) is simple, apt install postgresql-contrib postgresql-plpython.

CREATE FUNCTION get_csvfile(
  file text,
  delim_char char(1) = ',',
  quote_char char(1) = '"')
returns setof text[] stable language plpythonu as $$
  import csv
  return csv.reader(
     open(file, 'rb'),
     quotechar=quote_char,
     delimiter=delim_char,
     skipinitialspace=True,
     escapechar='\\'
  )
$$;

INSERT INTO mytable(info)
  SELECT jsonb_build_array(c[1],c[2],c[3]) 
  FROM get_csvfile('/tmp/myfile1.csv') c;

The split_csv() function was defined here. The csv.reader is very reliable (!).

Not tested for big-big CSV... But expected Python do job.


PostgreSQL workaround

It is not a perfect solution, but it solves the main problem, that is the

... big temporary table, so lost CPU, disk and my time"...

This is the way we do it, a workaround with file_fdw!

  1. Adopt your conventions to avoid file-copy and file-permission confusions... The standard file path for a CSV. Example: /tmp/pg_myPrj_file.csv

  2. Initialise your database or SQL script with the magic extension,

   CREATE EXTENSION file_fdw;
   CREATE SERVER files FOREIGN DATA WRAPPER file_fdw;
  1. For each CSV file, myNewData.csv,

    3.1. make a symbolic link (or scp remote copy) for your new file ln -sf $PWD/myNewData.csv /tmp/pg_socKer_file.csv

    3.2. configure the file_fdw for your new table (suppose mytable).

   CREATE FOREIGN TABLE temp1 (a int, b text, c text) 
   SERVER files OPTIONS ( 
     filename '/tmp/pg_socKer_file.csv', 
       format 'csv', 
       header 'true'
   );

PS: after running SQL script with psql, when having some permission problem, change owner of the link by sudo chown -h postgres:postgres /tmp/pg_socKer_file.csv.

3.3. use the file_fdw table as source (suppose populating mytable).

 INSERT INTO mytable(info)
 SELECT json_build_array(a,b,c) FROM temp1;

Thanks to @JosMac (and his tutorial)!


NOTE: if there is a STDIN way to do it (exists??), will be easy, avoiding permission problems and use of absolute paths. See this answer/discussion.

bormat
  • 1,309
  • 12
  • 16
Peter Krauss
  • 13,174
  • 24
  • 167
  • 304
  • For file permissions by ACL, see [this solution](http://unix.stackexchange.com/a/336675/130402) – Peter Krauss Jan 11 '17 at 22:44
  • Or just delimit the CSV with characters that exist neither in JSON or CSV... http://adpgtech.blogspot.co.uk/2014/09/importing-json-data.html – Daniel Cull Jan 19 '18 at 09:51
  • Hi @DanielCull, good workaround. About the main problem, performance (CPU-time and memory costs), the only solution is `FOREIGN TABLE`. About other context or condictions, yes is possible to use the `csv quote e'\x01' delimiter e'\x02';`, in a internal procedures, but not as *public data-interchange* (eg. see W3C's standards for semantic CSV or http://frictionlessdata.io interchange standards). – Peter Krauss Jan 20 '18 at 12:46
  • thank it works but your forgot to alias c "FROM get_csvfile('/tmp/myfile1.csv') c" – bormat Jun 19 '18 at 19:29
  • Hi @bormat, please edit (!) directally the text of the answer, it is a Wiki – Peter Krauss Jun 19 '18 at 20:37
  • that's right. I am not used to editing the post where I am not the author. – bormat Jun 19 '18 at 21:07
  • Sorry @bormat, you added "2 characters in body" correcting it (!)... I am deleting these comments. – Peter Krauss Jun 20 '18 at 03:29