24

I have a table schema which includes an int array column, and a custom aggregate function which sums the array contents. In other words, given the following:

CREATE TABLE foo (stuff INT[]);

INSERT INTO foo VALUES ({ 1, 2, 3 });
INSERT INTO foo VALUES ({ 4, 5, 6 });

I need a "sum" function that would return { 5, 7, 9 }. The PL/pgSQL version, which works correctly, is as follows:

CREATE OR REPLACE FUNCTION array_add(array1 int[], array2 int[]) RETURNS int[] AS $$
DECLARE
    result int[] := ARRAY[]::integer[];
    l int;
BEGIN
  ---
  --- First check if either input is NULL, and return the other if it is
  ---
  IF array1 IS NULL OR array1 = '{}' THEN
    RETURN array2;
  ELSEIF array2 IS NULL OR array2 = '{}' THEN
    RETURN array1;
  END IF;

  l := array_upper(array2, 1);

  SELECT array_agg(array1[i] + array2[i]) FROM generate_series(1, l) i INTO result;

  RETURN result;
END;
$$ LANGUAGE plpgsql;

Coupled with:

CREATE AGGREGATE sum (int[])
(
    sfunc = array_add,
    stype = int[]
);

With a data set of about 150,000 rows, SELECT SUM(stuff) takes over 15 seconds to complete.

I then re-wrote this function in C, as follows:

#include <postgres.h>
#include <fmgr.h>
#include <utils/array.h>

Datum array_add(PG_FUNCTION_ARGS);
PG_FUNCTION_INFO_V1(array_add);

/**
 * Returns the sum of two int arrays.
 */
Datum
array_add(PG_FUNCTION_ARGS)
{
  // The formal PostgreSQL array objects:
  ArrayType *array1, *array2;

  // The array element types (should always be INT4OID):
  Oid arrayElementType1, arrayElementType2;

  // The array element type widths (should always be 4):
  int16 arrayElementTypeWidth1, arrayElementTypeWidth2;

  // The array element type "is passed by value" flags (not used, should always be true):
  bool arrayElementTypeByValue1, arrayElementTypeByValue2;

  // The array element type alignment codes (not used):
  char arrayElementTypeAlignmentCode1, arrayElementTypeAlignmentCode2;

  // The array contents, as PostgreSQL "datum" objects:
  Datum *arrayContent1, *arrayContent2;

  // List of "is null" flags for the array contents:
  bool *arrayNullFlags1, *arrayNullFlags2;

  // The size of each array:
  int arrayLength1, arrayLength2;

  Datum* sumContent;
  int i;
  ArrayType* resultArray;


  // Extract the PostgreSQL arrays from the parameters passed to this function call.
  array1 = PG_GETARG_ARRAYTYPE_P(0);
  array2 = PG_GETARG_ARRAYTYPE_P(1);

  // Determine the array element types.
  arrayElementType1 = ARR_ELEMTYPE(array1);
  get_typlenbyvalalign(arrayElementType1, &arrayElementTypeWidth1, &arrayElementTypeByValue1, &arrayElementTypeAlignmentCode1);
  arrayElementType2 = ARR_ELEMTYPE(array2);
  get_typlenbyvalalign(arrayElementType2, &arrayElementTypeWidth2, &arrayElementTypeByValue2, &arrayElementTypeAlignmentCode2);

  // Extract the array contents (as Datum objects).
  deconstruct_array(array1, arrayElementType1, arrayElementTypeWidth1, arrayElementTypeByValue1, arrayElementTypeAlignmentCode1,
&arrayContent1, &arrayNullFlags1, &arrayLength1);
  deconstruct_array(array2, arrayElementType2, arrayElementTypeWidth2, arrayElementTypeByValue2, arrayElementTypeAlignmentCode2,
&arrayContent2, &arrayNullFlags2, &arrayLength2);

  // Create a new array of sum results (as Datum objects).
  sumContent = palloc(sizeof(Datum) * arrayLength1);

  // Generate the sums.
  for (i = 0; i < arrayLength1; i++)
  {
    sumContent[i] = arrayContent1[i] + arrayContent2[i];
  }

  // Wrap the sums in a new PostgreSQL array object.
  resultArray = construct_array(sumContent, arrayLength1, arrayElementType1, arrayElementTypeWidth1, arrayElementTypeByValue1, arrayElementTypeAlignmentCode1);

  // Return the final PostgreSQL array object.
  PG_RETURN_ARRAYTYPE_P(resultArray);
}

This version takes only 800 ms to complete, which is.... much better.

(Converted to a stand-alone extension here: https://github.com/ringerc/scrapcode/tree/master/postgresql/array_sum)

My question is, why is the C version so much faster? I expected an improvement, but 20x seems a bit much. What's going on? Is there something inherently slow about accessing arrays in PL/pgSQL?

I'm running PostgreSQL 9.0.2, on Fedora Core 8 64-bit. The machine is a High-Memory Quadruple Extra-Large EC2 instance.

BenMorel
  • 34,448
  • 50
  • 182
  • 322
Matt Solnit
  • 32,152
  • 8
  • 53
  • 57
  • 1
    +1 Very interesting Q with all the details needed. – Erwin Brandstetter Jun 07 '13 at 23:35
  • Erwin's absolutely right about *needing* to update to at least 9.0.13, by the way. You're vulnerable to a huge security hole (http://www.postgresql.org/support/security/faq/2013-04-04/), among other things, as well as some significant bugs. I do not understand why people keep running such old point releases. – Craig Ringer Jun 09 '13 at 09:17

2 Answers2

25

Why?

why is the C version so much faster?

A PostgreSQL array is its self a pretty inefficient data structure. It can contain any data type and it's capable of being multi-dimensional, so lots of optimisations are just not possible. However, as you've seen it's possible to work with the same array much faster in C.

That's because array access in C can avoid a lot of the repeated work involved in PL/PgSQL array access. Just take a look at src/backend/utils/adt/arrayfuncs.c, array_ref. Now look at how it's invoked from src/backend/executor/execQual.c in ExecEvalArrayRef. Which runs for each individual array access from PL/PgSQL, as you can see by attaching gdb to the pid found from select pg_backend_pid(), setting a breakpoint at ExecEvalArrayRef, continuing, and running your function.

More importantly, in PL/PgSQL every statement you execute is run through the query executor machinery. This makes small, cheap statements fairly slow even allowing for the fact that they're pre-prepared. Something like:

a := b + c

is actually executed by PL/PgSQL more like:

SELECT b + c INTO a;

You can observe this if you turn debug levels high enough, attach a debugger and break at a suitable point, or use the auto_explain module with nested statement analysis. To give you an idea of how much overhead this imposes when you're running lots of tiny simple statements (like array accesses), take a look at this example backtrace and my notes on it.

There is also a significant start-up overhead to each PL/PgSQL function invocation. It isn't huge, but it's enough to add up when it's being used as an aggregate.

A faster approach in C

In your case I would probably do it in C, as you have done, but I'd avoid copying the array when called as an aggregate. You can check for whether it's being invoked in aggregate context:

if (AggCheckCallContext(fcinfo, NULL))

and if so, use the original value as a mutable placeholder, modifying it then returning it instead of allocating a new one. I'll write a demo to verify that this is possible with arrays shortly... (update) or not-so-shortly, I forgot how absolute horrible working with PostgreSQL arrays in C is. Here we go:

// append to contrib/intarray/_int_op.c

PG_FUNCTION_INFO_V1(add_intarray_cols);
Datum           add_intarray_cols(PG_FUNCTION_ARGS);

Datum
add_intarray_cols(PG_FUNCTION_ARGS)
{
    ArrayType  *a,
           *b;

    int i, n;

    int *da,
        *db;

    if (PG_ARGISNULL(1))
        ereport(ERROR, (errmsg("Second operand must be non-null")));
    b = PG_GETARG_ARRAYTYPE_P(1);
    CHECKARRVALID(b);

    if (AggCheckCallContext(fcinfo, NULL))
    {
        // Called in aggregate context...
        if (PG_ARGISNULL(0))
            // ... for the first time in a run, so the state in the 1st
            // argument is null. Create a state-holder array by copying the
            // second input array and return it.
            PG_RETURN_POINTER(copy_intArrayType(b));
        else
            // ... for a later invocation in the same run, so we'll modify
            // the state array directly.
            a = PG_GETARG_ARRAYTYPE_P(0);
    }
    else 
    {
        // Not in aggregate context
        if (PG_ARGISNULL(0))
            ereport(ERROR, (errmsg("First operand must be non-null")));
        // Copy 'a' for our result. We'll then add 'b' to it.
        a = PG_GETARG_ARRAYTYPE_P_COPY(0);
        CHECKARRVALID(a);
    }

    // This requirement could probably be lifted pretty easily:
    if (ARR_NDIM(a) != 1 || ARR_NDIM(b) != 1)
        ereport(ERROR, (errmsg("One-dimesional arrays are required")));

    // ... as could this by assuming the un-even ends are zero, but it'd be a
    // little ickier.
    n = (ARR_DIMS(a))[0];
    if (n != (ARR_DIMS(b))[0])
        ereport(ERROR, (errmsg("Arrays are of different lengths")));

    da = ARRPTR(a);
    db = ARRPTR(b);
    for (i = 0; i < n; i++)
    {
            // Fails to check for integer overflow. You should add that.
        *da = *da + *db;
        da++;
        db++;
    }

    PG_RETURN_POINTER(a);
}

and append this to contrib/intarray/intarray--1.0.sql:

CREATE FUNCTION add_intarray_cols(_int4, _int4) RETURNS _int4
AS 'MODULE_PATHNAME'
LANGUAGE C IMMUTABLE;

CREATE AGGREGATE sum_intarray_cols(_int4) (sfunc = add_intarray_cols, stype=_int4);

(more correctly you'd create intarray--1.1.sql and intarray--1.0--1.1.sql and update intarray.control. This is just a quick hack.)

Use:

make USE_PGXS=1
make USE_PGXS=1 install

to compile and install.

Now DROP EXTENSION intarray; (if you already have it) and CREATE EXTENSION intarray;.

You'll now have the aggregate function sum_intarray_cols available to you (like your sum(int4[]), as well as the two-operand add_intarray_cols (like your array_add).

By specializing in integer arrays a whole bunch of complexity goes away. A bunch of copying is avoided in the aggregate case, since we can safely modify the "state" array (the first argument) in-place. To keep things consistent, in the case of non-aggregate invocation we get a copy of the first argument so we can still work with it in-place and return it.

This approach could be generalised to support any data type by using the fmgr cache to look up the add function for the type(s) of interest, etc. I'm not particularly interested in doing that, so if you need it (say, to sum columns of NUMERIC arrays) then ... have fun.

Similarly, if you need to handle dissimilar array lengths, you can probably work out what to do from the above.

Craig Ringer
  • 307,061
  • 76
  • 688
  • 778
  • @ErwinBrandstetter Well, as you said, it's a good question. Deserves a little effort. – Craig Ringer Jun 09 '13 at 09:15
  • This is great stuff. Thank you very much! I'll try out your alternate C implementation when I have a chance. – Matt Solnit Jun 11 '13 at 05:56
  • @MattSolnit You might also find it interesting to do a run under profiler, so you can see where the time's being spent. Who knows, you might find something you can contribute back. – Craig Ringer Jun 11 '13 at 06:00
  • Does `b = PG_GETARG_ARRAYTYPE_P(1)` need to be freed later? I see in intarray's `icount` that `PG_FREE_IF_COPY(a, 0);` is used after a similar call. – beldaz Aug 27 '16 at 22:07
  • 1
    @beldaz In PostgreSQL you can usually rely on the memory context taking care of minor leaks like that, and freeing the whole memory context is generally considerably faster than retail pfree()ing too. But if you expect the array to possibly be large then it may be worth freeing it, yes. – Craig Ringer Aug 29 '16 at 01:53
  • Hey @CraigRinger, I'm wondering, why do you access array elements like `*da = *da + *db;` instead of `da[i] = da[i] + db[i];`, I changed brackets for pointers in a C function of my own and it goes 30% faster! – ruloweb Jun 03 '19 at 12:41
13

PL/pgSQL excels as server-side glue for SQL elements. Procedural elements and lots of assignments are not among its strengths. Assignments, tests or looping are comparatively expensive and only warranted if they help to take shortcuts one could not achieve with just SQL. The same logic implemented in C will always be faster, but you seem to be well aware of that ...

Typically, a pure SQL solution is faster. Compare this simple, equivalent solution in your test setup:

SELECT array_agg(a + b)
FROM  (
   SELECT unnest('{1, 2, 3 }'::int[]) AS a
        , unnest('{4, 5, 6 }'::int[]) AS b
   ) x;

You can wrap it into a simple SQL function. Or, for best performance, integrate it into your big query directly:

SELECT tbl_id, array_agg(a + b)
FROM  (
   SELECT tbl_id
        , unnest(array1) AS a
        , unnest(array2) AS b
   FROM   tbl
   ORDER  BY tbl_id
   ) x
GROUP  BY tbl_id;

Set-returning functions only work in parallel in a SELECT if the number of returned rows is identical. I.e.: works only for arrays of equal length. The behavior has eventually been sanitized in Postgres 10. See:

It's generally best to test with a current version of Postgres. At least update to the latest point release (9.0.15 at the time of writing). Might be part of the explanation for the big difference in performance.

Postgres 9.4

There is a cleaner solution for unnesting in parallel now:

Erwin Brandstetter
  • 605,456
  • 145
  • 1,078
  • 1,228