I have a table storing array elements by the array they belong to and their index in the array. It seemed smart because the arrays were expected to be sparse, and have their elements updated individually. Let's say this is the table:
CREATE TABLE values (
pk TEXT,
i INTEGER,
value REAL,
PRIMARY KEY (pk, i)
);
pk | i | value
----+---+-------
A | 0 | 17.5
A | 1 | 32.7
A | 3 | 5.3
B | 1 | 13.5
B | 2 | 4.8
B | 4 | 89.1
Now I would like to get these as real arrays, i.e. {17.5, 32.7, NULL, 53}
for A and {NULL, 13.5, 4.8, NULL, 89.1}
for B.
I would have expected that it's easily possible with a grouping query
and an appropriate aggregate function. However, it turned out that there
is no such function that puts elements into an array by its index (or
subscript, as postgres calls it). It would've been much simpler if the
elements were successive - I just could've used array_agg
with
ORDER BY i
. But I want the null values in the result
arrays.
What I ended up with was this monster:
SELECT
pk,
ARRAY( SELECT
( SELECT value
FROM values innervals
WHERE innervals.pk = outervals.pk AND i = generate_series
)
FROM generate_series(0, MAX(i))
ORDER BY generate_series -- is this really necessary?
)
FROM values outervals
GROUP BY pk;
Having to SELECT … FROM values
twice is ugly, and the query planner doesn't seem to be able to optimise this.
Is there a simple way to refer to the grouped rows as a relation in a subquery, so that I could just SELECT value FROM generate_series(0, MAX(i)) LEFT JOIN ???
?
Would it be more appropriate to solve this by defining a custom aggregate function?
Edit: It seems what I was looking for is possible with multiple-argument unnest
and array_agg
, although it is not particularly elegant:
SELECT
pk,
ARRAY( SELECT val
FROM generate_series(0, MAX(i)) AS series (series_i)
LEFT OUTER JOIN
unnest( array_agg(value ORDER BY i),
array_agg(i ORDER BY i) ) AS arr (val, arr_i)
ON arr_i = series_i
ORDER BY series_i
)
FROM values
GROUP BY pk;
The query planner even seems to realise that it can do a sorted merge , although I need to put some more effort in really understanding the JOIN
on the sorted series_i
and arr_i
EXPLAIN
output. Edit 2: It's actually a hash join between series_i
and arr_i
, only the outer group aggregation uses a "sorted" strategy.