29

So I have this table:

create table test (
   id integer, 
   rank integer,
   image varchar(30)
); 

Then some values:

id | rank | image   
---+------+-------  
 1 |    2 | bbb  
 1 |    3 | ccc  
 1 |    1 | aaa  
 2 |    3 | c  
 2 |    1 | a  
 2 |    2 | b  

I want to group them by id and concatenate the image name in the order given by rank. In mySQL I can do this:

  select id, 
         group_concat( image order by rank asc separator ',' ) 
    from test 
group by id;

And the output would be:

1 aaa,bbb,ccc
2 a,b,c
Is there a way I can have this in postgresql?

If I try to use array_agg() the names will not show in the correct order and apparently I was not able to find a way to sort them. (I was using postgres 8.4 )

MWiesner
  • 8,868
  • 11
  • 36
  • 70
user491575
  • 563
  • 1
  • 6
  • 6

1 Answers1

49

In PostgreSQL 8.4 you cannot explicitly order array_agg but you can work around it by ordering the rows passed into to the group/aggregate with a subquery:

SELECT id, array_to_string(array_agg(image), ',')
FROM (SELECT * FROM test ORDER BY id, rank) x
GROUP BY id;

In PostgreSQL 9.0 aggregate expressions can have an ORDER BY clause:

SELECT id, array_to_string(array_agg(image ORDER BY rank), ',')
FROM test
GROUP BY id;
Jason Weathered
  • 7,762
  • 2
  • 41
  • 37
  • Actually if I add more rows to my test table this solution does not work. Surprisingly enough is not working for me even if I load the table in the correct order. However your solution works in postgresql 8.4 only if I sort the subquery by both id and rank. weird if you ask me – user491575 Oct 31 '10 at 08:38
  • The workaround is a bit hacky and it appears to show for you. I suspect there's a query plan difference causing this. I'm somewhat curious as to what your `EXPLAIN` output is with the bad result. In any case, I've updated my answer with `ORDER BY id, rank`. – Jason Weathered Nov 01 '10 at 02:11
  • First I want to thank you for the answer. It does solve the problem I had. – user491575 Nov 01 '10 at 15:31
  • Second running a query on a small test set seem to be irrelevant. For example, I added 2 indexes to our test table , one on id, one on rank, and it works fine. On my real data though I still have to order by both id and rank like I said. And take a look at the plans for both cases: – user491575 Nov 01 '10 at 15:36
  • 1
    plan for test with 2 indexes: HashAggregate (cost=1.29..1.40 rows=7 width=82) -> Sort (cost=1.17..1.19 rows=7 width=86) Sort Key: test.rank -> Seq Scan on test (cost=0.00..1.07 rows=7 width=86) – user491575 Nov 01 '10 at 15:36
  • I have the same 2 indexes on my real data as well but the plan is different here: GroupAggregate (cost=41962.39..42622.69 rows=200 width=328) -> Sort (cost=41962.39..42181.16 rows=87507 width=328) Sort Key: x.id -> Subquery Scan x (cost=0.00..8452.33 rows=87507 width=328) -> Index Scan using tt2_idx on sequence (cost=0.00..7577.26 rows=87507 width=89) – user491575 Nov 01 '10 at 15:37
  • As I suspected, the real data uses a different query plan. With a small amount of data, it's actually quicker to just scan the whole table than to use the index. This is reflected in your `EXPLAIN` output. p.s. Something to take note of when benchmarking, etc is to run an `ANALYZE` on your table after making data changes to make sure the statistics are updated. – Jason Weathered Nov 08 '10 at 02:07
  • 6
    In Postgres 9.0 you don't need `array_agg` at all. You can use `string_agg(image, ',' order by rank)` –  Apr 11 '14 at 14:10