6

Suppose I have the following table in Postgres 9.4:

 a | b 
---+---
 1 | 2
 3 | 1
 2 | 3
 1 | 1

If I run

 select array_agg(a) as a_agg, array_agg(b) as b_agg from foo

I get what I want

   a_agg   |   b_agg   
-----------+-----------
 {1,3,2,1} | {2,1,3,1}

The orderings of the two arrays are consistent: the first element of each comes from a single row, as does the second, as does the third. I don't actually care about the order of the arrays, only that they be consistent across columns.

It seems natural that this would "just happen", and it seems to. But is it reliable? Generally, the ordering of SQL things is undefined unless an ORDER BY clause is specified. It is perfectly possible to get postgres to generate inconsistent pairings with inconsistent ORDER BY clauses within array_agg (with some explicitly counterproductive extra work):

select array_agg(a order by b) as agg_a, array_agg(b order by a) as agg_b from foo;

yields

   agg_a   |   agg_b   
-----------+-----------
 {3,1,1,2} | {2,1,3,1}

This is no longer consistent. The first array elements 3 and 2 did not come from the same original row.

I'd like to be certain that, without any ORDER BY clause, the natural thing just happens. Even with an ordering on either column, ambiguity would remain because of the duplicate elements. I'd prefer to avoid imposing an unambiguous sort, because in my real application, the tables will be large and the sorting might be costly. But I can't find any documentation that guarantees or specifies that, absent imposition of inconsistent orderings, multiple array_agg calls will be ordered consistently, even though it'd be very surprising if they weren't.

Is it safe to assume that the ordering of multiple array_agg columns will be consistently ordered when no ordering is explicitly imposed on the query or within the aggregate functions?

Steve Waldman
  • 13,689
  • 1
  • 35
  • 45
  • I think it's safe to assume that the orders will be consistent. Check my answer [here](https://stackoverflow.com/a/65606694/4710226). – ebk Jan 07 '21 at 04:41

2 Answers2

1

According to PostgreSQL documentation :

Ordinarily, the input rows are fed to the aggregate function in an unspecified order. [...]

However, some aggregate functions (such as array_agg and string_agg) produce results that depend on the ordering of the input rows. When using such an aggregate, the optional order_by_clause can be used to specify the desired ordering.

The way I understand it : you can't be sure that the order of rows is preserved unless you use ORDER BY.

  • thanks! do you think that "depend on the ordering of the input rows" implies consistent preservation of that order? – Steve Waldman Apr 01 '16 at 06:05
  • No, i don't. The order is 'unspecified' so it's not necessary the order of rows. – Connected Wanderer Apr 01 '16 at 06:41
  • yeah. i agree. it's not quite clear. it seems likely they mean that, whatever the unspecified order is, the aggregation will happen over all columns in that order. but they don't quite say that. – Steve Waldman Apr 01 '16 at 06:43
0

It seems there is a similar (or almost same) question here: PostgreSQL array_agg order

I prefer ebk's answer

So I think it's fine to assume that all the aggregates, none of which uses ORDER BY, in your query will see input data in the same order. The order itself is unspecified though (which depends on the order the FROM clause supplies rows).

But you can still add order in array_agg function to force same order.

Kn.Bk
  • 21
  • 4