3

I have a table like the following:

| col_A | col_B |
|-------|-------|
| 1     | 1     |
| 1     | 2     |
| 1     | 3     |
| 2     | 1     |
| 2     | 2     |
| 2     | 3     |
| 3     | 1     |
| 3     | 2     |

I want to group and concatenate the results into an array like the following:

| col_A | col_B |
|-------|-------|
| 1,2   | 1,2,3 |
| 3     |  1,2  |

My attempt at writing a query:

SELECT col_A, array_agg(col_B ORDER BY col_B DESC) FROM table GROUP BY col_A;

However, this outputs:

| col_A | col_B   |
|-------|---------|
| 1     | {1,2,3} |
| 2     | {1,2,3} |
| 3     | {1,2}   |
Erwin Brandstetter
  • 605,456
  • 145
  • 1,078
  • 1,228
MHS
  • 35
  • 1
  • 5

2 Answers2

1

Seems like you want to aggregate col_b grouped by col_a, and then aggregate col_a grouped by the aggregated arrays. You can do this with a nested query, of which you did the first step already:

SELECT array_agg(col_a), col_b
FROM (SELECT col_a, array_agg(DISTINCT col_b ORDER BY col_b) AS col_b
      FROM example GROUP BY col_a) grouped_b
GROUP BY col_b;

(online demo)

Bergi
  • 630,263
  • 148
  • 957
  • 1,375
  • You might want to remove the `DISTINCT` from the aggregate, not sure whether you need/want that. – Bergi Oct 09 '19 at 23:56
1

In Postgres, ordering rows in a subquery is typically cheaper than adding an ORDER BY clause to the aggregate function:

SELECT array_agg(col_a) AS col_a, col_b
FROM  (
   SELECT col_a, array_agg(col_b) AS col_b
   FROM  (TABLE tbl ORDER BY col_a, col_b) t
   GROUP  BY 1
   ORDER  BY 1
   ) sub
GROUP  BY 2
ORDER  BY 1;

Produces your desired result exactly, including order of elements and rows.

db<>fiddle here

The 1st ORDER BY is to sort elements before constructing the array in col_b.
The 2nd ORDER BY is to sort elements before constructing the array in col_a.
The 3rd ORDER BY is to output rows in the order suggested by the OPs example result.

The query should work the same without 2nd and 3rd ORDER BY, because the 1st order is carried over. But there are no guarantees. (Things like parallelism might mess with order when querying big tables.) The presented query is guaranteed to produce the desired order.

Related:

About the positional references in GROUP BY and ORDER BY:

About (TABLE tbl ORDER BY col_a, col_b):

Erwin Brandstetter
  • 605,456
  • 145
  • 1,078
  • 1,228
  • What is this `GROUP BY 1` and `ORDER BY 2` syntax, does that refer to the column indices? And why exactly do you sort them at all (apart from ensuring that the array elements will have the same order in the outer grouping comparison)? – Bergi Oct 10 '19 at 00:06
  • 1
    @Bergi: I added a link to explain positional references. And explanation for each `ORDER BY`. – Erwin Brandstetter Oct 10 '19 at 00:08
  • Thanks, I had never seen ordinal numbers to refer to columns. Makes sense though! – Bergi Oct 10 '19 at 00:22
  • @Bergi: I used them here to avoid more typing. Whether it's easier to read & maintain is a matter of debate. Most useful to avoid repeating lengthy expressions or with dynamic SQL. – Erwin Brandstetter Oct 10 '19 at 00:32