38

I have a table with the an array column type:

 title       tags
"ridealong";"{comedy,other}"
"ridealong";"{comedy,tragedy}"
"freddyjason";"{horror,silliness}"

I would like to write a query that produces a single array per title(in an ideal world it would be a set/deduplicated array)

e.g.

select array_cat(tags),title from my_test group by title

The above query doesn't work of course, but I would like to produce 2 rows:

"ridealong";"{comedy,other,tragedy}"
"freddyjason";"{horror,silliness}"

Any help or pointers would be very much appreciated (I am using Postgres 9.1)


Based on Craig's help I ended up with the following (slightly altered syntax since 9.1 complains about the query exactly as he shows it)

SELECT t1.title, array_agg(DISTINCT tag.tag) 
FROM my_test t1, (select unnest(tags) as tag,title from my_test) as tag 
where tag.title=t1.title
GROUP BY t1.title;
Yana K.
  • 1,926
  • 4
  • 19
  • 27

2 Answers2

37

Custom aggregate

Approach 1: define a custom aggregate. Here's one I wrote earlier.

CREATE TABLE my_test(title text, tags text[]);

INSERT INTO my_test(title, tags) VALUES
('ridealong', '{comedy,other}'),
('ridealong', '{comedy,tragedy}'),
('freddyjason', '{horror,silliness}');

CREATE AGGREGATE array_cat_agg(anyarray) (
  SFUNC=array_cat,
  STYPE=anyarray
);

select title, array_cat_agg(tags) from my_test group by title;

LATERAL query

... or since you don't want to preserve order and want to deduplicate, you could use a LATERAL query like:

SELECT title, array_agg(DISTINCT tag ORDER BY tag) 
FROM my_test, unnest(tags) tag 
GROUP BY title;

in which case you don't need the custom aggregate. This one is probably a fair bit slower for big data sets due to the deduplication. Removing the ORDER BY if not required may help, though.

Community
  • 1
  • 1
Craig Ringer
  • 307,061
  • 76
  • 688
  • 778
  • Thanks a lot Craig. In 9.1 I was getting an error "ERROR: function expression in FROM cannot refer to other relations of same query level" so I modified the query (I added at the bottom of my question). If you have any suggestions for improvement I'd love them -- not excited about the join I had to add but could not get the right result otherwise – Yana K. Jun 11 '14 at 14:04
  • Upgrade :p . `LATERAL` was added in 9.3 IIRC. – Craig Ringer Jun 11 '14 at 14:39
  • 2
    From Postgres-14 on, the AGGREGATE definition must use `anycompatiblearray` instead of `anyarray`. – fjf2002 Feb 24 '22 at 17:40
6

The obvious solution would be the LATERAL join (which also suggested by @CraigRinger), but that is added to PostgreSQL in 9.3.

In 9.1 you cannot avoid the sub-query, but you can simplify it:

SELECT title, array_agg(DISTINCT tag)
FROM (SELECT title, unnest(tags) FROM my_test) AS t(title, tag)
GROUP BY title;

SQL Fiddle

pozs
  • 34,608
  • 5
  • 57
  • 63