Postgres bulk INSERT function using JSON arguments

Question

Here's a plpgsql function for postgres 9.6. It tries to INSERT a row, and if the insert doesn't fail (due to a key constraint violation), then it runs a few more commands.

CREATE FUNCTION foo(int, text, text)
  RETURNS void AS
$$
BEGIN
  INSERT INTO table1 (id, val1, val2) VALUES ($1, $2, $3) ON CONFLICT DO NOTHING;
  IF FOUND THEN
    INSERT INTO table2 (table1_id, val1) VALUES ($1, $2);
    UPDATE table3 SET (val2, time) = ($3, now()) WHERE table1_id = $1;
  END IF;
END
$$

This function processes a single record, but how could you modify it to process a batch of thousands of records?

I found an answer, which suggests to make each of the 3 function arguments an array. But is there a way to do it where I'm passing in arguments that more closely represent how the records would look in my application?

For example, the ideal solution would be my application code calls select foo($1), where the parameter $1 is a JSON array of objects, where each inner object is a record to be inserted.

[ 
  { "id": "1", "val1": "1-val1", "val2": "1-val2" },
  { "id": "2", "val1": "2-val1", "val2": "2-val2" },
  { "id": "3", "val1": "3-val1", "val2": "3-val2" },
  { "id": "4", "val1": "4-val1", "val2": "4-val2" }
]

The second-best option would be my application code calls select foo($1, $2, $3, $4), where each parameter is a JSON object corresponding to a record to be inserted.

{ "id": "1", "val1": "1-val1", "val2": "1-val2" }  // This would be $1
{ "id": "2", "val1": "2-val1", "val2": "2-val2" }  // This would be $2

I'm looking at the various JSON functions offered by Postgres here and they seem relevant to this but I can't figure out which exactly to use. Is what I'm looking to do even possible? Would using JSON arrays instead of JSON objects anywhere make this possible?

Yes, tried it now and it works, thank you! – user779159 Feb 08 '17 at 09:13 — user779159, Feb 08 '17 at 09:13

Erwin Brandstetter · Accepted Answer · 2021-05-06T01:49:48.763

For thousands of records

1. Create a temporary table of input rows, comprised of your values $1, $2, $3. The fastest way to upload is COPY - or the \copy meta-command of psql if the data is not on the same machine. Let's suppose this table:

CREATE TEMP TABLE tmp(id int PRIMARY KEY, val1 text, val2 text);

I added a PK constraint, which is totally optional, but it makes sure we are dealing with unique not-null int values. If you can vouch for input data, you don't need the constraint.

2. Chain your commands with data-modifying CTEs. As we have determined under your previous question, there are no race conditions to take care of in this particular operation.

WITH ins1 AS (
   INSERT INTO table1 AS t1 (id, val1, val2)
   SELECT id, val1, val2 FROM tmp ON CONFLICT DO NOTHING
   RETURNING t1.id, t1.val1, t1.val2  -- only actually inserted rows returned
   )
, ins2 AS (
   INSERT INTO table2 (table1_id, val1)
   SELECT id, val1 FROM ins1
   )
UPDATE table3 t3
SET    val2 = i.val2
     , time = now()
FROM   ins1 i
WHERE  t3.table1_id = i.id;

Step 1. and 2. must must run in the same session (not necessarily the same transaction), since the scope of temp tables is bound to the same session.

Note, the UPDATE only depends on the 1st INSERT, success of the 2nd INSERT is guaranteed, since there is no ON CONFLICT DO NOTHING and the whole operation would be rolled back if there is any conflict in the 2nd INSERT.

COPY command: copy only specific columns from csv

For just a couple of records

There are various options how. Your idea to pass a JSON array to a function is one of them. If objects match the target table, you can use json_populate_recordset() in a single INSERT query. Or just use the INSERT (as prepared statement) without function wrapper.

INSERT INTO target_tbl  -- it's ok to omit target columns here
SELECT *
FROM   json_populate_recordset(null::target_tbl,  -- use same table type
          json '[{ "id": "1", "val1": "1-val1", "val2": "1-val2" },
                 { "id": "2", "val1": "2-val1", "val2": "2-val2" },
                 { "id": "3", "val1": "3-val1", "val2": "3-val2" },
                 { "id": "4", "val1": "4-val1", "val2": "4-val2" }]');

For just a handful of columns you might also pass an array for each column and loop through them in parallel. You can do this with a simple loop on the array index. Since Postgres 9.4 there is also the convenient unnest() with multiple parameters to do it all in a single query:

Unnest multiple arrays in parallel

The best solution depends on the data format you have.

Thanks for this. (I'll explore your answer and try it.) But assuming it isn't thousands of rows, can I use JSON arguments as I described in the question? I'm asking not just for this query, but because I have several `plpgsql` functions where I'd like to be able to process several records from my application (like in a loop within the function), but I must pre-process the data in my app to make it look like separate arrays for each parameter (like in http://stackoverflow.com/a/24350689/779159). Are either of the options in my question possible (one array of objects, or several objects)? — user779159, Jan 06 '17 at 10:16
For example, could I use the `jsonb_to_recordset` function to pass in a single JSON array of objects, and loop through that recordset and run through the `INSERT`s for each record in the recordset? — user779159, Jan 06 '17 at 16:36
I came across your answer about loops in http://stackoverflow.com/a/19147320/779159. How could I loop through the `recordset` returned by `json_populate_recordset` (the **For just a couple of records** part of your answer above), so that I can do the `INSERT INTO`, `IF FOUND THEN`, etc from my question for each iteration of the loop? — user779159, Jan 30 '17 at 21:33
@user779159: You don't need to loop. Chain data-modifying CTEs like demonstrated in the first query. That's *much* faster and safer than looping for every row. — Erwin Brandstetter, Jan 30 '17 at 21:47
In what ways would it be safer? Btw if I'm using a temp table, and there are multiple processes doing this bulk insert, would these temp tables conflict with each other? — user779159, Jan 30 '17 at 22:06
Actually I think the best outcome would be to combine your suggestions of `json_populate_recordset` and the CTE you wrote. So in the CTE if I replace `FROM tmp` with `FROM recordset`, it would be an effective solution in your opinion? — user779159, Jan 30 '17 at 22:09
@user779159: Yes, that's what I meant. You can use the result of the 2nd query with `json_populate_recordset()` in place of the temp table in the 1st query. — Erwin Brandstetter, Jan 31 '17 at 02:20
perfect answer, I was looking for either an array way to do it or a json way to do it and you mentioned how to do both! And I learned about the idea of using a temporary table and COPY. — Andy, May 06 '21 at 01:14

Postgres bulk INSERT function using JSON arguments

1 Answers1

For thousands of records

For just a couple of records

Linked