2

To reduce some of the query load on my database, I currently batch multiple queries together into one INSERT query with a variable number of rows. This produces queries like:

INSERT INTO example_table (column_1, column_2, ...)
VALUES ($1, $2, ...), ($3, $4, ...), ($5, $6, ...), ...
RETURNING "id";

Because the number of rows varies, from the perspective of the pg_stat_statements extension, it looks like lots of different types of queries are run.

I am looking to efficiently batch-insert several rows while allowing pg_stat_statements to aggregate all the query statistics together neatly. Is there some approach I can use to achieve this, perhaps by telling pg_stat_statements that these are all the same query type or by using prepared statements?

Laurenz Albe
  • 209,280
  • 17
  • 206
  • 263
ide
  • 19,942
  • 5
  • 64
  • 106
  • It is a potential improvement for `pg_stat_statements` not to differentiate by the length of lists. Why is it such a problem? You can aggregate the statistics for the variants in a `pg_stat_statements` query. – Laurenz Albe Aug 27 '19 at 04:13
  • I'd prefer to see the aggregate stats and to streamline the entries in `pg_stat_statements`. – ide Aug 27 '19 at 07:16

1 Answers1

2

I addressed this by inserting rows with UNNEST, which allowed for a consistent query shape regardless of the number of rows inserted.

INSERT INTO example_table (column_1, column_2, ...)
  (SELECT * FROM UNNEST($1::uuid[], $2::varchar[], ...));

UNNEST worked because I didn't need to use DEFAULT in any of my rows. In some cases the array casts are not necessary, but I added them to be explicit. They are also needed for some types like uuid[].

For my use case, the performance was comparable to that of INSERT queries with multiple VALUE tuples: UNNEST was faster on average but had a higher standard deviation across about 600k calls each.

ide
  • 19,942
  • 5
  • 64
  • 106