1

I call an API using request to get a JSON data and store it in one variable like this:

r = requests.post(url=endpoint , data=payload, headers=headers, params=parameter)
all = json.loads(r.text)

then I used loop to insert data row by row like this:

for row in all:
   sql = "INSERT INTO t VALUES ({},{},{});".format(all['id'],all['num'],all['type'])
   cur.execute(sql)

The real one has more columns not just 3 columns like the example.

This code work totally fine but my questions here is that, are there any other way to insert JSON data to the table? Because I need to insert like 4-5 thousand rows per 1 request which will take very long time (comparing with copy_expert on CSV file) since it insert row by row. Are there a ways without using loop or anything that might help this insert process to be faster?

I used PosgreSQL database with Python here

emp
  • 602
  • 3
  • 11
  • 22
  • I am not enough into Python but in Java there is something like a batch insert. This is exactly what you describe. Another way could be to create the query within the loop and execute it afterwards. Something like `INSERT INTO t VALUES (a,b,c), (b,c, d), (d,e,f), ...`(just creating the values instead of the whole query)? – S-Man Jun 24 '19 at 13:25
  • Does [this](https://stackoverflow.com/questions/41493451/postgres-bulk-insert-function-using-json-arguments) help you in any way? – shahkalpesh Jun 24 '19 at 13:27
  • Your `format` arguments should be indexing `row`, not `all`, surely? – holdenweb Jun 24 '19 at 13:30

2 Answers2

1

This technique should be the fastest since it doesn't involve any client side loops or parsing, but it requires that the keys in your json structure match your column names.

INSERT INTO t 
SELECT * from json_populate_recordset(null::t, your_json);

Here's an example:

create table t (a int, b int, c int);
insert into t 
select * from json_populate_recordset(null::t, 
             '[{"a": 1, "b": 2}, {"c": 3, "a": 1}, {"b": 4, "a": 7, "c": 9}, {"d": 5}]'
);
select * from t;
 a | b | c
---+---+---
 1 | 2 |
 1 |   | 3
 7 | 4 | 9
   |   |
(4 rows)

Note that the last row is all null because it only had "d" in it, which is not a column in t.

In python, this would translate to something like this:

r = requests.post(url=endpoint , data=payload, headers=headers, params=parameter)
all = r.text # No need to parse this, unless you want to check it's valid
cur.execute('insert into t select * from json_populate_recordset(null::t, %s)', [all])
Jeremy
  • 6,313
  • 17
  • 20
  • I adapt a bit and it work perfectly. But are there any constraint that I should aware other than the json key having the same as column name? – emp Jul 09 '19 at 07:16
  • Not really, just be aware that it's all or nothing. If there's malformed data somewhere in your json, you won't be able to catch the exception and skip that row. Instead, you will insert nothing. – Jeremy Jul 09 '19 at 18:29
0

You will find that DBAPI cursors have not only an execute method but also an execute_many that was designed for just such purposes as yours.

Call it with a SQL statement as the first argument and a sequence of tuples or lists, each of which is destined to become a row in a database.

Please note that your existing code is very dangerously-structured, because it is vulnerable to SQL injection attacks vectored through the remote site that provides the JSON. Note also that a requests.Response object has a .json method to save you the trouble of the conversion. Since you don't quote the table description I can't know the column names, but also please realise that SQL INSERT statements are much more robust when you give explicit column names. The existing code might be invalidated by a re-structuring of the table that put the columns in a different order.

Putting all this knowledge together would give you something like the following (untested) code.

r = requests.post(url=endpoint , data=payload, headers=headers, params=parameter)

data = [(row['id'], row['num'],row['type']) for row in r.json()]
cur.execute_many("INSERT INTO t VALUES ({},{},{})", data)

Don't forget to commit the changes when you are finished!

holdenweb
  • 33,305
  • 7
  • 57
  • 77