2

I'm trying to run a plpgsql function on a multi-dimensional array column (int[][]) in a select query.

The function is as such:

CREATE OR REPLACE FUNCTION public.reduce_to_percentages(int[][])
RETURNS float[] AS
$function$
DECLARE
  s int[];
  a float[];
BEGIN
    FOREACH s SLICE 1 IN ARRAY $1 LOOP
        a := a || s[2]::float / s[1]::float;
    END LOOP;
    RETURN a;
END;
$function$
LANGUAGE plpgsql VOLATILE;

The following query works:

SELECT reduce_to_percentages(ARRAY[ARRAY[100, 20], ARRAY[300, 50]]);

So does the following query:

SELECT reduce_to_percentages((SELECT counts FROM objects LIMIT 1));

But the following query provides the function with a null value, and causes an exception when trying to FOREACH over $1:

SELECT reduce_to_percentages(counts) FROM objects;
Erwin Brandstetter
  • 605,456
  • 145
  • 1,078
  • 1,228
Yotam Ofek
  • 2,300
  • 1
  • 18
  • 21

2 Answers2

3

You can fix the error by checking for NULL values like @klin provided.
But that's putting lipstick on a pig. Rewrite the function to replace the procedural loop with a set-based solution. Typically faster (especially when used in the context of an outer query), simpler and automatically null-safe:

CREATE OR REPLACE FUNCTION public.reduce_to_percentages(int[])
  RETURNS float[] AS
$func$
SELECT ARRAY (SELECT $1[d][2]::float / $1[d][1]
              FROM   generate_subscripts($1,1) d)
$func$  LANGUAGE sql IMMUTABLE;

Related:

Notes:


Alternatively, you could declare your function as STRICT (synonym: RETURNS NULL ON NULL INPUT). Minor difference: it returns NULL for NULL input, not an empty array ('{}') like the above.

CREATE OR REPLACE FUNCTION public.reduce_to_percentages(int[][])
  RETURNS float[] AS
$function$
 ...
$function$ LANGUAGE plpgsql IMMUTABLE STRICT;

The manual:

RETURNS NULL ON NULL INPUT or STRICT indicates that the function always returns null whenever any of its arguments are null. If this parameter is specified, the function is not executed when there are null arguments; instead a null result is assumed automatically.

But the simple SQL function above is preferable for multiple reasons.

Erwin Brandstetter
  • 605,456
  • 145
  • 1,078
  • 1,228
  • Wow, very thorough explanation, thank you very much!!! As for `int[][]` being ignored, I did know that but for my own read-ability I wrote the function to be more explicit in the type of value it accepts. Two things that I still don't understand: 1) what is the difference between immutable and volatile functions, what makes this function immutable? 2) why did @klin's answer solve my problem? He only added a case for null values, which returns an empty list, but using his function made my query work (for non-null values). – Yotam Ofek Jun 11 '16 at 13:41
  • @YotamOfek: Ad 1.) [I added a link to more explanation](http://stackoverflow.com/questions/28569415/how-do-immutable-stable-and-volatile-keywords-effect-behaviour-of-function/28573737#28573737). Ad 2.) The error you saw was raised by the `FOREACH` construct that does not work for NULL input. – Erwin Brandstetter Jun 11 '16 at 14:25
  • *putting lipstick on a pig* - you have a specific sense of humor, but I like it. You know that your solution is faster, or just guessing? – klin Jun 11 '16 at 21:43
  • @klin: I wrote *typically*. And yes, I *know* from many benchmarks. It also makes sense, given the underlying mechanisms. I assume you are aware that ["putting lipstick on a pig" is a rhetorical expression](https://en.wikipedia.org/wiki/Lipstick_on_a_pig). No offense intended. And your answer would be the lipstick in this, not the pig. I'll comment on the benchmark under your answer. – Erwin Brandstetter Jun 12 '16 at 01:55
2

In the function body just check if the argument is not null. If you want to return an empty array for a null argument then add an initial value for the variable a in declare section (otherwise the function will return null).

CREATE OR REPLACE FUNCTION public.reduce_to_percentages(int[][])
RETURNS float[] AS
$function$
DECLARE
  s int[];
  a float[] = '{}';     -- initial value
BEGIN
    IF $1 IS NOT NULL THEN
        FOREACH s SLICE 1 IN ARRAY $1 LOOP
            a := a || s[2]::float / s[1]::float;
        END LOOP;
    END IF;
    RETURN a;
END;
$function$
LANGUAGE plpgsql VOLATILE;

I have done a simple test. This is a modest setup:

create table test (val int[]);
insert into test
select (array[array[ri(100), ri(100)], array[ri(100), ri(100)]])
from generate_series(1, 1000000);

ri(100) is my function returning random integer between 1-100. So the table contains a million rows with an array of two arrays of two integers in one column. I tried the test to be as simple and typical as possible.

The following query has been executed 10 times with Erwin's function and 10 times with my variant:

select sum(v[1]), sum(v[2])
from (
    select reduce_to_percentages(val) v
    from test
    ) s;

Average execution time of ten tests:

  • Erwin's function 11940 ms
  • klin's function 4750 ms

Maybe my function is a pig with lipstick on its snout but it is a speedy one.

klin
  • 112,967
  • 15
  • 204
  • 232
  • Thank you for the answer, but the column value is not null, and is defined as such. That's why using a subquery works. – Yotam Ofek Jun 10 '16 at 18:42
  • No, the subquery works because you have `limit 1` (and the selected value is not null). Just try the function. – klin Jun 10 '16 at 20:16
  • It works! Can you please explain why? I mean, as far as I can see in what you changed in the function, you're simply returning an empty array in the event of a null value. How does it end up working, as opposed to not checking for null? Thanks!! – Yotam Ofek Jun 10 '16 at 23:02
  • Your original function was not prepared for null values and threw an exception in `foreach`. Your only trouble was to not allow to execute `foreach` for null values. – klin Jun 11 '16 at 00:35
  • I like benchmarks. Effective performance depends on many factors - as I am sure you know. You tested with array-length 2. For just two elements, an array type would be a big waste. Separate columns would make more sense. An array makes sense for a sizeable (variable) number of elements. Repeat your benchmark with 20 or 100 elements and you'll see the set-based solution scales much better. (I tested on 9.5!) PL/pgSQL plays into this, too. Sometimes reduces overhead, but can be much more expensive where a simple SQL alternative can be inlined in the context of the outer query. – Erwin Brandstetter Jun 12 '16 at 01:59
  • While being at it, test your function with volatility `STABLE` or `IMMUTABLE`. Depending on the test, this can make a substantial difference ... – Erwin Brandstetter Jun 12 '16 at 02:01
  • I knew that with increasing cardinality of the arrays, the cost of the call of `generate_subscripts()` will be better compensated. I only wanted to point out that categorical statements sometimes tend to be risky. Making the function `immutable` makes sense mainly when we are going to call it from where clause. Anyway I consider our polemic fruitful. – klin Jun 12 '16 at 09:06