3
t1
id|entity_type
9|3
9|4
9|5
2|3
2|5
           
t2  
id|entity_type
1|3
1|4
1|5     

SELECT t1.id, array_agg(t1.entity_type)
    FROM t1
GROUP BY
    t1.id
HAVING ARRAY_AGG(t1.entity_type ORDER BY t1.entity_type) = 
    (SELECT ARRAY_AGG(t2.entity_type ORDER BY t2.entity_type) 
        FROM t2
    WHERE t2.id = 1
    GROUP BY t2.id);

Result:

t1.id = 9|array_agg{3,4,5}      
    

I have two tables t1 and t2. I want to get value of t1.id where t1.entity_type array equals t2.entity_type array.

In this scenario everything works fine. For t2.id = 1 I receive t1.id = 9. Both have the same array of entity_type: {3,4,5}

Now I'd like to get t1.id not only for equal sets, but also for smaller sets. If I modify t2 this way:

t2  
id|entity_type
1|3
1|4

and modify query this way:

SELECT t1.id, array_agg(t1.entity_type)
    FROM t1
GROUP BY
    t1.id
HAVING ARRAY_AGG(t1.entity_type ORDER BY t1.entity_type) >= /*MODIFICATION*/
    (SELECT ARRAY_AGG(t2.entity_type ORDER BY t2.entity_type) 
        FROM t2
    WHERE t2.id = 1
    GROUP BY t2.id);
    

I don't receive the expected result:

t1.id = 1 has {3, 4, 5}     
t2.id = 1 has {3, 4}

Arrays in t1 that contain the array in t2 should qualify. I expect to receive results as in first case but I get no rows.
Is there any method like: ARRAY_AGG contains another ARRAY_AGG?

Erwin Brandstetter
  • 605,456
  • 145
  • 1,078
  • 1,228
Marcin
  • 57
  • 7
  • 1
    That's not what `>=` does. I think you want `@>`. It's all explained in the docs: http://www.postgresql.org/docs/9.3/static/functions-array.html – Nick Barnes Oct 22 '14 at 14:19
  • Is the combination of `(id, entity_type)` unique in both tables? Ideally, you should provide complete table definitions to clarify the details. And your version of Postgres, of course. – Erwin Brandstetter Oct 22 '14 at 15:40

1 Answers1

5

Clean up

It would be inefficient to use two differing invocations of array_agg(). Use the same (ORDER BY in SELECT list and HAVING clause):

SELECT id, array_agg(entity_type ORDER BY entity_type) AS arr
FROM   t1
GROUP  BY 1
HAVING array_agg(entity_type ORDER BY entity_type) = (
   SELECT array_agg(entity_type ORDER BY entity_type)
   FROM   t2
   WHERE  id = 1
   -- GROUP  BY id   -- not needed
   );

Syntax basics in the manual.

"contains" operator @>

Like Nick commented, your 2nd query would work with the "array contains" operator @>

SELECT id, array_agg(entity_type ORDER BY entity_type) AS arr
FROM   t1
GROUP  BY 1
HAVING array_agg(entity_type ORDER BY entity_type) @> (
   SELECT array_agg(entity_type ORDER BY entity_type)
   FROM   t2
   WHERE  id = 1
   );

But this is very inefficient for big tables.

Faster query

The problem can be cast as a case of relational division. Depending on your table definition, there are more efficient techniques. We have gathered a whole arsenal under this related question:

Assuming (id, entity_type) is unique in both tables, this should be substantially faster for big tables, especially because it can use an index on t1 (as opposed to your original query):

SELECT t1.id
FROM   t2
JOIN   t1 USING (entity_type)
WHERE  t2.id = 1
GROUP  BY 1
HAVING count(*) = (SELECT count(*) FROM t2 WHERE id = 1);

You need two indexes:

First on t2(id), typically covered by the primary key.
Second:

CREATE INDEX t1_foo_idx ON t1 (entity_type, id);

The added id column is optional to allow index-only scans. Sequence of columns is essential:

fiddle
Old sqlfiddle

Erwin Brandstetter
  • 605,456
  • 145
  • 1,078
  • 1,228