51

(have done this before, but memory fades, as does goggle)

wish to get select from users with the tag.tag_ids for each user returned as an array.

select usr_id,
       name,
       (select t.tag_id from tags t where t.usr_id = u.usr_id) as tag_arr
from   users u;

with the idea embedded query tag_arr would be an array

klin
  • 112,967
  • 15
  • 204
  • 232
cc young
  • 18,939
  • 31
  • 90
  • 148

2 Answers2

87

Use the aggregate function:

select
    usr_id, 
    name, 
    array_agg(tag_id) as tag_arr
from users
join tags using(usr_id)
group by usr_id, name

or an array constructor from the results of a subquery:

select
    u.usr_id, 
    name, 
    array(
        select tag_id 
        from tags t 
        where t.usr_id = u.usr_id
        ) as tag_arr
from users u

The second option is a simple one-source query while the first one is more generic, especially convenient when you need more than one aggregate from a related table. Also, the first variant should be faster on larger tables.

Note, that for better performance the usr_id columns in both tables should be indexed. While typically users.usr_id is a primary key, sometimes one may forget that the index of referencing column is also useful.

klin
  • 112,967
  • 15
  • 204
  • 232
  • Of those two options, can the first one using `array_agg()` have significantly better execution time? I presume it depends on the sizes of the data tables involved. In a real project I just compared the two queries with `explain analyze` and got execution times of `120ms` vs `10s` respectively. I'm not expert at db query plans, so any insight on how these two execute differently would be helpful. – broc.seib Sep 29 '21 at 18:26
  • 1
    @broc.seib - it depends on many factors such as the number of rows in both tables, the average number of tags per user, Postgres version, server and hardware configuration. For larger tables, the first query should be faster than the other. The difference in your case seems too large though. The reason may be the lack of index on `tags(usr_id)`, which would be useful for the first query as well. – klin Sep 29 '21 at 19:35
5

Use the array constructor of PostgreSQL:

select
    usr_id,
    name,
    array(select t.tag_id from tags t where t.usr_id = u.usr_id) as tag_arr
from users u;

Note:

If you're using psycopg2 with python, then the result will be converted to a python list as well! (Although for uuid[] array, you will need to convert it to text[] array using array(...)::text[], if you want to get IDs in python list). See this for details.

Ali Sajjad
  • 3,589
  • 1
  • 28
  • 38