plpgsql function taking a multi-dimensional array as parameter

Question

I'm trying to run a plpgsql function on a multi-dimensional array column (int[][]) in a select query.

The function is as such:

CREATE OR REPLACE FUNCTION public.reduce_to_percentages(int[][])
RETURNS float[] AS
$function$
DECLARE
  s int[];
  a float[];
BEGIN
    FOREACH s SLICE 1 IN ARRAY $1 LOOP
        a := a || s[2]::float / s[1]::float;
    END LOOP;
    RETURN a;
END;
$function$
LANGUAGE plpgsql VOLATILE;

The following query works:

SELECT reduce_to_percentages(ARRAY[ARRAY[100, 20], ARRAY[300, 50]]);

So does the following query:

SELECT reduce_to_percentages((SELECT counts FROM objects LIMIT 1));

But the following query provides the function with a null value, and causes an exception when trying to FOREACH over $1:

SELECT reduce_to_percentages(counts) FROM objects;

Erwin Brandstetter · Accepted Answer · 2018-05-20T02:19:27.130

You can fix the error by checking for NULL values like @klin provided.
But that's putting lipstick on a pig. Rewrite the function to replace the procedural loop with a set-based solution. Typically faster (especially when used in the context of an outer query), simpler and automatically null-safe:

CREATE OR REPLACE FUNCTION public.reduce_to_percentages(int[])
  RETURNS float[] AS
$func$
SELECT ARRAY (SELECT $1[d][2]::float / $1[d][1]
              FROM   generate_subscripts($1,1) d)
$func$  LANGUAGE sql IMMUTABLE;

Unnest array by one level

Notes:

Casting the dividend to float is enough. float / integer returns float automatically.
There is no data type int[][]. Just int[]. Postgres allows the notation, but ignores additional array dimensions for array types, since the data type itself is the same for all dimensions. Details:
- mapping postgresql text[][] type and Java type
Make the function IMMUTABLE (since it is) for better performance and to allow indexing. Related:
- How do IMMUTABLE, STABLE and VOLATILE keywords effect behaviour of function?
- Does PostgreSQL support "accent insensitive" collations?

Alternatively, you could declare your function as STRICT (synonym: RETURNS NULL ON NULL INPUT). Minor difference: it returns NULL for NULL input, not an empty array ('{}') like the above.

CREATE OR REPLACE FUNCTION public.reduce_to_percentages(int[][])
  RETURNS float[] AS
$function$
 ...
$function$ LANGUAGE plpgsql IMMUTABLE STRICT;

The manual:

RETURNS NULL ON NULL INPUT or STRICT indicates that the function always returns null whenever any of its arguments are null. If this parameter is specified, the function is not executed when there are null arguments; instead a null result is assumed automatically.

But the simple SQL function above is preferable for multiple reasons.

Wow, very thorough explanation, thank you very much!!! As for `int[][]` being ignored, I did know that but for my own read-ability I wrote the function to be more explicit in the type of value it accepts. Two things that I still don't understand: 1) what is the difference between immutable and volatile functions, what makes this function immutable? 2) why did @klin's answer solve my problem? He only added a case for null values, which returns an empty list, but using his function made my query work (for non-null values). — Yotam Ofek, Jun 11 '16 at 13:41
@YotamOfek: Ad 1.) [I added a link to more explanation](http://stackoverflow.com/questions/28569415/how-do-immutable-stable-and-volatile-keywords-effect-behaviour-of-function/28573737#28573737). Ad 2.) The error you saw was raised by the `FOREACH` construct that does not work for NULL input. — Erwin Brandstetter, Jun 11 '16 at 14:25
*putting lipstick on a pig* - you have a specific sense of humor, but I like it. You know that your solution is faster, or just guessing? — klin, Jun 11 '16 at 21:43
@klin: I wrote *typically*. And yes, I *know* from many benchmarks. It also makes sense, given the underlying mechanisms. I assume you are aware that ["putting lipstick on a pig" is a rhetorical expression](https://en.wikipedia.org/wiki/Lipstick_on_a_pig). No offense intended. And your answer would be the lipstick in this, not the pig. I'll comment on the benchmark under your answer. — Erwin Brandstetter, Jun 12 '16 at 01:55

klin · Answer 2 · 2016-06-11T22:24:05.313

2

In the function body just check if the argument is not null. If you want to return an empty array for a null argument then add an initial value for the variable a in declare section (otherwise the function will return null).

CREATE OR REPLACE FUNCTION public.reduce_to_percentages(int[][])
RETURNS float[] AS
$function$
DECLARE
  s int[];
  a float[] = '{}';     -- initial value
BEGIN
    IF $1 IS NOT NULL THEN
        FOREACH s SLICE 1 IN ARRAY $1 LOOP
            a := a || s[2]::float / s[1]::float;
        END LOOP;
    END IF;
    RETURN a;
END;
$function$
LANGUAGE plpgsql VOLATILE;

I have done a simple test. This is a modest setup:

create table test (val int[]);
insert into test
select (array[array[ri(100), ri(100)], array[ri(100), ri(100)]])
from generate_series(1, 1000000);

ri(100) is my function returning random integer between 1-100. So the table contains a million rows with an array of two arrays of two integers in one column. I tried the test to be as simple and typical as possible.

The following query has been executed 10 times with Erwin's function and 10 times with my variant:

select sum(v[1]), sum(v[2])
from (
    select reduce_to_percentages(val) v
    from test
    ) s;

Average execution time of ten tests:

Erwin's function 11940 ms
klin's function 4750 ms

Maybe my function is a pig with lipstick on its snout but it is a speedy one.

edited Jun 11 '16 at 22:24

answered Jun 10 '16 at 17:08

klin

112,967
15
204
232

Thank you for the answer, but the column value is not null, and is defined as such. That's why using a subquery works. – Yotam Ofek Jun 10 '16 at 18:42
No, the subquery works because you have `limit 1` (and the selected value is not null). Just try the function. – klin Jun 10 '16 at 20:16
It works! Can you please explain why? I mean, as far as I can see in what you changed in the function, you're simply returning an empty array in the event of a null value. How does it end up working, as opposed to not checking for null? Thanks!! – Yotam Ofek Jun 10 '16 at 23:02
Your original function was not prepared for null values and threw an exception in `foreach`. Your only trouble was to not allow to execute `foreach` for null values. – klin Jun 11 '16 at 00:35
I like benchmarks. Effective performance depends on many factors - as I am sure you know. You tested with array-length 2. For just two elements, an array type would be a big waste. Separate columns would make more sense. An array makes sense for a sizeable (variable) number of elements. Repeat your benchmark with 20 or 100 elements and you'll see the set-based solution scales much better. (I tested on 9.5!) PL/pgSQL plays into this, too. Sometimes reduces overhead, but can be much more expensive where a simple SQL alternative can be inlined in the context of the outer query. – Erwin Brandstetter Jun 12 '16 at 01:59
While being at it, test your function with volatility `STABLE` or `IMMUTABLE`. Depending on the test, this can make a substantial difference ... – Erwin Brandstetter Jun 12 '16 at 02:01
I knew that with increasing cardinality of the arrays, the cost of the call of `generate_subscripts()` will be better compensated. I only wanted to point out that categorical statements sometimes tend to be risky. Making the function `immutable` makes sense mainly when we are going to call it from where clause. Anyway I consider our polemic fruitful. – klin Jun 12 '16 at 09:06

plpgsql function taking a multi-dimensional array as parameter

2 Answers2