68

For example I have a file customers.json which is an array of objects (strictly formed) and it's pretty plain (without nested objects) like this (what is important: it's already include ids):

[
  {
    "id": 23635,
    "name": "Jerry Green",
    "comment": "Imported from facebook."
  },
  {
    "id": 23636,
    "name": "John Wayne",
    "comment": "Imported from facebook."
  }
]

And I want to import them all into my postgres db into a table customers.

I found some pretty difficult ways when I should import it as json-typed column to a table like imported_json and column named data with objects listed there, then to use sql to get these values and insert it into a real table.

But is there a simple way of importing json to postgres with no touching of sql?

Chris Stryczynski
  • 30,145
  • 48
  • 175
  • 286
Jerry Green
  • 1,244
  • 1
  • 15
  • 25
  • 1
    "*with no touching of sql*" no. The only way to interact with a relational database that speaks SQL is - well - SQL. –  Aug 30 '16 at 09:59
  • @a_horse_with_no_name oh... I pretty sure then I can simply convert my json to sql query. I will research this :) – Jerry Green Aug 30 '16 at 10:04
  • @a_horse_with_no_name Not true. You can use a library that wraps SQL. In Ruby we have ActiveRecord and Sequel. – Keith Bennett Feb 14 '20 at 09:49

6 Answers6

76

It turns out there's an easy way to import a multi-line JSON object into a JSON column in a postgres database using the command line psql tool, without needing to explicitly embed the JSON into the SQL statement. The technique is documented in the postgresql docs, but it's a bit hidden.

The trick is to load the JSON into a psql variable using backticks. For example, given a multi-line JSON file in /tmp/test.json such as:

{
  "dog": "cat",
  "frog": "frat"
}

We can use the following SQL to load it into a temporary table:

sql> \set content `cat /tmp/test.json`
sql> create temp table t ( j jsonb );
sql> insert into t values (:'content');
sql> select * from t;

which gives the result:

               j                
────────────────────────────────
 {"dog": "cat", "frog": "frat"}
(1 row)

You can also perform operations on the data directly:

sql> select :'content'::jsonb -> 'dog';
 ?column? 
──────────
 "cat"
(1 row)

Under the covers this is just embedding the JSON in the SQL, but it's a lot neater to let psql perform the interpolation itself.

Doctor Eval
  • 3,541
  • 3
  • 20
  • 20
  • 7
    ^ this is gold. Just imported 26000 records into my db in less than 5 minutes. Not a single step took more than a second. – fgblomqvist Feb 09 '18 at 07:17
  • how do I add an index column to the table by extracting a specific field from each json dict? – sds Jul 17 '18 at 15:46
  • I think this will answer your question: https://stackoverflow.com/questions/17807030/how-to-create-index-on-json-field-in-postgres-9-3 – Doctor Eval Jul 18 '18 at 23:26
  • 2
    Great feature. 'type' is the 'cat' equivalent in Windows. – woddle Jan 15 '19 at 14:00
  • 1
    @DoctorEval This is incredibly useful. Could you do the same thing from a plpgsql function? – knowads Apr 04 '19 at 21:56
  • @knowads I think that plpgsql has some file access functions, but the problem is that any function would execute in the context of the server rather than the psql shell. However, you could certainly load data into a psql variable (as in my example) and pass that variable into a plpgsql function for processing, no problem. – Doctor Eval Apr 04 '19 at 22:59
  • I got an out of memory error when using your solution for a 2gb file. Million records. `ERROR: out of memory DETAIL: Cannot enlarge string buffer containing 0 bytes by 1970793963 more bytes.` – scientific_explorer Apr 03 '21 at 12:16
  • As I noted, this is just interpolation - your entire 2GB file is loaded into psql. You might want to use a tool such as `jq` to split the file into smaller segments. – Doctor Eval Apr 04 '21 at 13:33
74

You can feed the JSON into a SQL statement that extracts the information and inserts that into the table. If the JSON attributes have exactly the name as the table columns you can do something like this:

with customer_json (doc) as (
   values 
    ('[
      {
        "id": 23635,
        "name": "Jerry Green",
        "comment": "Imported from facebook."
      },
      {
        "id": 23636,
        "name": "John Wayne",
        "comment": "Imported from facebook."
      }
    ]'::json)
)
insert into customer (id, name, comment)
select p.*
from customer_json l
  cross join lateral json_populate_recordset(null::customer, doc) as p
on conflict (id) do update 
  set name = excluded.name, 
      comment = excluded.comment;

New customers will be inserted, existing ones will be updated. The "magic" part is the json_populate_recordset(null::customer, doc) which generates a relational representation of the JSON objects.


The above assumes a table definition like this:

create table customer 
(
  id        integer primary key,
  name      text not null,
  comment   text
);

If the data is provided as a file, you need to first put that file into some table in the database. Something like this:

create unlogged table customer_import (doc json);

Then upload the file into a single row of that table, e.g. using the \copy command in psql (or whatever your SQL client offers):

\copy customer_import from 'customers.json' ....

Then you can use the above statement, just remove the CTE and use the staging table:

insert into customer (id, name, comment)
select p.*
from customer_import l
  cross join lateral json_populate_recordset(null::customer, doc) as p
on conflict (id) do update 
  set name = excluded.name, 
      comment = excluded.comment;
  • I appreciate your help but I found that I better convert my json to sql (with ruby in my case). And then simply import sql with psql command. It's easier than parsing json objects with sql for me :) Btw thx. – Jerry Green Aug 31 '16 at 15:21
  • can you also help me with this? https://stackoverflow.com/questions/65602905/error-extra-data-after-last-expected-column-in-postgres/65603032#65603032 – vikrant rana Jan 07 '21 at 07:40
  • @a_horse_with_no_name and anyone who needs to import json file. This was just tremendously helpful. Still don't understand how the cross join lateral works... but it works. – MC Hammerabi Nov 05 '21 at 12:55
36

In near big-data cases, the most efficient way to import json from a file, not using any external tool, appear to not import a single json from a file but rather a single column csv: A list of one-line jsons:

data.json.csv:

{"id": 23635,"name": "Jerry Green","comment": "Imported from facebook."}
{"id": 23636,"name": "John Wayne","comment": "Imported from facebook."}

then, under psql:

create table t ( j jsonb )
\copy t from 'd:\path\data.json.csv' csv quote e'\x01' delimiter e'\x02'

One record per json (line) will be added into t table.

"\copy from" import was made for csv, and as such loads data line by line. As a result reading one json per line rather than a single json array to be later splited, will not use any intermediate table and will achieve high throughput.

More of that you will less likely hit the max input line-size limitation that will arise if your input json file is too big.

I would thus first convert your input into a single column csv to then import it using the copy command.

Yuri Astrakhan
  • 8,808
  • 6
  • 63
  • 97
user1767316
  • 3,276
  • 3
  • 37
  • 46
  • Thanks this was very small but useful bit information I didn't know – Kimmo Hintikka Jul 07 '20 at 08:14
  • Can confirm, I'm importing close to 30-40GBs of files within 2 mins using this approach. Can read more about it in my SO Q&A [here](https://stackoverflow.com/questions/66930940/how-do-i-copy-import-a-json-file-into-postgres) – scientific_explorer Apr 03 '21 at 18:57
  • 1
    converting to single line json might not be that easy, even if COPY accept binary you might prefer single line json. In this aim the [importing json data blog post](http://adpgtech.blogspot.com/2014/09/importing-json-data.html) referenced from related post (mentionned by @scientific_explorer) might be helpfull, but you might also like the --raw-output export option from [jq](https://stedolan.github.io/jq/manual/#Invokingjq) – user1767316 Apr 21 '21 at 19:48
  • 1
    What worked for me was adding: `csv quote e'\x01' delimiter e'\x02';` from the link that the other comment points to. – Rutrus May 18 '21 at 14:52
  • For me, putting these two lines in the same line worked! `\copy t from 'd:\path\data.json.csv' csv quote e'\x01' delimiter e'\x02';` – Mehrdad Salimi Aug 22 '22 at 14:30
8

You can use spyql. Running the following command would generate INSERT statements that you can pipe into psql:

$ jq -c .[] customers.json | spyql -Otable=customer "SELECT json->id, json->name, json->comment FROM json TO sql"
INSERT INTO "customer"("id","name","comment") VALUES (23635,'Jerry Green','Imported from facebook.'),(23636,'John Wayne','Imported from facebook.');

jq is used to transform the json array into json lines (1 json object per line) and then spyql takes care of converting json lines into INSERT statements.

To import the data into PostgreSQL:

$ jq -c .[] customers.json | spyql -Otable=customer "SELECT json->id, json->name, json->comment FROM json TO sql" | psql  -U your_user_name -h your_host your_database

Disclaimer: I am the author of spyql.

Daniel C. Moura
  • 165
  • 2
  • 6
  • 1
    Really nice tool. This helps when importing from a .json.gz file without having to decompress it first. The argument `-Ochunk_size=50` is useful for making sure that the SQL commands are split in chunks so that they don't take up lots of memory in big files. Thanks for suggesting it. – Tiago Jan 12 '22 at 17:51
  • 1
    Brilliant tool, thanks a lot for creating it and for the example here! – Greg Sadetsky Apr 20 '22 at 00:31
2

If you want to do it from a command line ...

NOTE: This isn't a direct answer to your question, as this will require you to convert your JSON to SQL. You will probably have to deal with JSON 'null' when converting anyway. You could use a view or materialized view to make that problem invisible-ish, though.

Here is a script I've used for importing JSON into PostgreSQL (WSL Ubuntu), which basically requires that you mix psql meta commands and SQL in the same command line. Note use of the somewhat obscure script command, which allocates a pseudo-tty:

$ more update.sh
#!/bin/bash
wget <filename>.json
echo '\set content `cat $(ls -t <redacted>.json.* | head -1)` \\ delete from <table>; insert into <table> values(:'"'content'); refresh materialized view <view>; " | PGPASSWORD=<passwd> psql -h <host> -U <user> -d <database>
$

(Copied from my answer at Shell script to execute pgsql commands in files)

Joseph H
  • 33
  • 3
  • Thank you! This is amazing! I use this approach to load thousands of JSON files to database. It is fast and easy. – Y Bai Sep 22 '22 at 20:35
1

Another option is to use sling. See this blog post which covers loading JSON files into PG. You could simply pipe your json file like this:

$ export POSTGRES='postgresql://...'

$ sling conns list
+------------+------------------+-----------------+
| CONN NAME  | CONN TYPE        | SOURCE          |
+------------+------------------+-----------------+
| POSTGRES   | DB - PostgreSQL  | env variable    |
+------------+------------------+-----------------+

$ cat /tmp/records.json | sling run --tgt-conn POSTGRES --tgt-object public.records --mode full-refresh
11:09AM INF connecting to target database (postgres)
11:09AM INF reading from stream (stdin)
11:09AM INF writing to target database [mode: full-refresh]
11:09AM INF streaming data
11:09AM INF dropped table public.records
11:09AM INF created table public.records
11:09AM INF inserted 500 rows in 0 secs [1,556 r/s]
11:09AM INF execution succeeded

Using debug mode would show a DDL of create table if not exists public.records ("data" jsonb). If you would like to flatten your JSON, sling does that as well by adding the --src-options 'flatten: true' option:

$ cat /tmp/records.json | sling run --src-options 'flatten: true' --tgt-conn POSTGRES --tgt-object public.records --mode full-refresh

The DDL in that case would be something like:

create table if not exists public.records ("_id" varchar(255),
"age" integer,
"balance" varchar(255),
"company__about" text,
"company__address" varchar(255),
"company__email" varchar(255),
"company__latitude" numeric,
"company__longitude" numeric,
"company__name" varchar(255),
"company__phone" varchar(255),
"company__registered" varchar(255),
"isactive" bool,
"name" varchar(255),
"picture" varchar(255),
"tags" jsonb)

FYI, I am the author of sling.

Fritz Larco
  • 61
  • 1
  • 1
  • 6