I changed your column name group
to grp
because group
is a reserved word in Postgres and every SQL standard and shouldn't be used as identifier.
I understand your question like this:
Get the two arrays sorted in identical sort order so that the same element position corresponds to the same row in both arrays.
Use a subquery or CTE and order the rows before you aggregate.
SELECT id, array_agg(grp) AS grp, array_agg(dt) AS dt
FROM (
SELECT *
FROM tbl
ORDER BY id, grp, dt
) x
GROUP BY id;
That's faster than to use individual ORDER BY
clauses in the aggregate function array_agg()
like @Mosty demonstrates (and which has been there since PostgreSQL 9.0). Mosty also interprets your question differently and uses the proper tools for his interpretation.
Is ORDER BY
in a subquery safe?
The manual:
The aggregate functions array_agg
, json_agg
, [...] as well as
similar user-defined aggregate functions, produce meaningfully
different result values depending on the order of the input values.
This ordering is unspecified by default, but can be controlled by
writing an ORDER BY
clause within the aggregate call, as shown in
Section 4.2.7. Alternatively, supplying the input values from a
sorted subquery will usually work. For example:
SELECT xmlagg(x) FROM (SELECT x FROM test ORDER BY y DESC) AS tab;
Beware that this approach can fail if the outer query level contains
additional processing, such as a join, because that might cause the
subquery's output to be reordered before the aggregate is computed.
So yes, it's safe in the example.
Without subquery
If you really need a solution without subquery, you can:
SELECT id
, array_agg(grp ORDER BY grp)
, array_agg(dt ORDER BY grp, dt)
FROM tbl
GROUP BY id;
Note the ORDER BY grp, dt
. I sort by dt
in addition to break ties and make the sort order unambiguous. Not necessary for grp
, though.
There is also a completely different way to do this, with window functions:
SELECT DISTINCT ON (id)
id
, array_agg(grp) OVER w AS grp
, array_agg(dt) OVER w AS dt
FROM tbl
WINDOW w AS (PARTITION BY id ORDER BY grp, dt
ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING)
ORDER BY id;
Note the DISTINCT ON (id)
instead of just DISTINCT
which produces the same result but performs faster by an order of magnitude because we do not need an extra sort.
I ran some tests and this is almost as fast as the other two solutions. As expected, the subquery version was still fastest. Test with EXPLAIN ANALYZE
to see for yourself.