Slow handling of 2-dimensional arrays in plpgsql

Question

I am programming a stored procedure in PostgreSQL. The algorithm should handle a 2 dimensional array of double precision numbers.

As far as I have investigated array operations in Postgres are generic and quite heavy. The simple example I'm trying to prove has an excessive computational cost.

Example:

CREATE OR REPLACE FUNCTION fill_2d_array( rows integer, cols integer) 
  RETURNS integer AS
$BODY$

DECLARE

img double precision[][];

i integer; j integer;
cont integer;

BEGIN

img  := ARRAY( SELECT 0 FROM generate_series(1, filas * columnas) ) ; 
cont:= 0;
For i IN 1..rows LOOP
    For j IN 1..cols LOOP
        img[i * cols + j] := (i * cols + j)::double precision;
        cont := cont + 1;
    END LOOP;
END LOOP;

return cont;
END;
$BODY$
  LANGUAGE plpgsql;

Can someone help me find an alternative path or an improvement to handle two-dimensional arrays?

I can't decypher what are you trying to accomplish... Why can't you just use arrayagg() instead of loops? — kworr, Sep 30 '13 at 08:09
`filas` and `columnas` are undefined. I suppose that should be `rows` and `cols`? And please describe what trying to do. Since you are only returning an integer, the whole operation seems pointless? — Erwin Brandstetter, Oct 01 '13 at 21:58

Erwin Brandstetter · Answer 1 · 2022-04-10T00:27:16.667

Procedural function

Basic problems

Declaring the dimensions of an array variable, like float8[][] for a 2-dimensional array, only serves documentation. Consider details in this related answer:

Mapping PostgreSQL text[][] type and Java type

You confused 1-dimenstional and 2-dimensional arrays. While declaring a 2-dimenstional array (to no effect), you only make it out to be a 1-dimensional array.

To initialize an array, use array_fill():

img := array_fill(0, ARRAY[rows,cols])

This example produces a 2-dimensional array - as opposed to your faulty statement, producing a 1-dimensional array:

img  := ARRAY( SELECT 0 FROM generate_series(1, rows* cols) );  -- wrong!

The displayed array subscripts img[i * cols + j] hardly make sense. The maximum would be twice of what you initialized, resulting in "out-of-bound" errors. I suppose you mean img[i][j].

Working version

Everything put together it could work like this:

CREATE OR REPLACE FUNCTION f_array_fill(rows integer, cols integer, OUT img float8[][]) 
  LANGUAGE plpgsql AS
$func$
DECLARE
   i int;
   j int;
BEGIN
   img := array_fill(0, ARRAY[rows,cols]);

   FOR i IN 1 .. rows LOOP
       FOR j IN 1 .. cols LOOP
           img[i][j] := (i * cols + j)::float8;
       END LOOP;
   END LOOP;
END
$func$;

Call:

SELECT f_array_fill(2,3);

Result:

{{4,5,6},{7,8,9}}

To make the function useful, return the produced array. Using an OUT parameter for that.

Superior set-based version

Looping and individual assignments are comparatively slow in PL/pgSQL. See:

Function Performance

Array handling performs particularly poorly as explained by @Craig in this related answer:

Why is PostgreSQL array access so much faster in C than in PL/pgSQL?

A set-based approach used to scale better with old versions. But the difference is mostly gone with modern Postgres.

Aggregate function for multi-dimensional arrays

In Postgres 9.4 or older, we needed a custom aggregate function. array_agg() only produced 1-dimensional arrays. Since Postgres 9.5 we can just use built-in functions. See.

Initial array in function to aggregate multi-dimensional array

SQL function doing the the same as the above:

CREATE OR REPLACE FUNCTION f_array_fill_sql(_rows integer, _cols integer)
  RETURNS float8[][]  -- 2nd dim only for documentation
  LANGUAGE sql AS
$func$
SELECT array_agg(arr1) AS arr2
FROM  (
   SELECT array_agg((i * $2 + j)::float8) AS arr1
   FROM        generate_series(1, _rows) i
   CROSS  JOIN generate_series(1, _cols) j
   GROUP  BY i
   ORDER  BY i
   ) sub
$func$;