SQL Select all rows where subset exists

Question

I'm sure there is an answer present for this question but bear with me as I'm new to SQL and am not sure how to ask the question.

I have data like this (this is shorthand purely for example). This is in a postgres db.

table1
id    value
1     111
1     112
1     113
2     111
2     112
2     116
3     111
3     122
3     123
4     126
5     123
5     125
6     111
6     112
6     116

table2
value
111
112
116

I need return the id of table1 where all values in table2 exist in the values of table1. So for this example, my query would return 2 and 6.

Is there any way to do this in SQL? Or could you possibly guide me to a data structure that would allow for me to get this result? I am able to change up the structure of either table to accommodate the ultimate need of getting this result

Thank you so much. An answer to this would be a life saver.

Looks to me like the answer of @ErwinBrandstetter did already "save your life". Could you please mind to either accept it as the correct answer or comment on any problems with this answer. — tscho, Dec 08 '11 at 11:20
See this question with various ways to solve this relational division problem: http://stackoverflow.com/questions/7364969/how-to-filter-sql-results-in-a-has-many-through-relation — ypercubeᵀᴹ, Dec 08 '11 at 12:34

Erwin Brandstetter · Accepted Answer · 2011-12-08T02:40:57.290

6

Consider this demo:

CREATE TEMP TABLE table1(id int, value int);
INSERT INTO table1 VALUES
 (1,111),(1,112),(1,113)
,(2,111),(2,112),(2,116)
,(3,111),(3,122),(3,123)
,(4,126)
,(5,123),(5,125)
,(6,111),(6,112),(6,116);

CREATE TEMP TABLE table2(value int);
INSERT INTO table2 VALUES
 (111)
,(112)
,(116);

SELECT t1.id
FROM   table1 t1
JOIN   table2 t2 USING (value)
GROUP  BY t1.id
HAVING count(*) = (SELECT count(*) FROM table2)
ORDER  BY t1.id;

Result:

id
-----
2
6

Returns all ids of table1 that appear with all values provided by table2 once.
Works for any number of rows in both tables.

If duplicate rows can appear in table1 make that:

HAVING count(DISTINCT value) = (SELECT count(*) FROM table2)

edited Dec 08 '11 at 02:40

answered Dec 08 '11 at 02:14

Erwin Brandstetter

605,456
145
1,078
1,228

1

Didn't you think to mention [this answer](http://stackoverflow.com/a/7774879/15354)? – onedaywhen Dec 08 '11 at 13:56
@onedaywhen: You are right, it's closely related. This case is somewhat simplified in comparison. – Erwin Brandstetter Dec 08 '11 at 17:53

score 3 · Answer 2 · answered Dec 08 '11 at 14:10

It seems to me that as much as anything you want to know how to ask the right question. The magic words here are "relational division".

It is one of the operators in Codd's relational algebra and there have been several variations proposed since. Most recently, Chris Date has proposed replacing the whole concept with image relations.

SQL has no explicit divide operator. There are a number of workarounds using other operator and the most appropriate will depend on your requirements, including exact division or division with remainder and how to handle an empty divisor. Then there are the usual considerations: SQL product and version, performance, personal style and taste, etc.

Here are a couple of articles which should help you with these choices:

On Making Relational Division Comprehensible

Divided We Stand: The SQL of Relational Division

+1 Very useful entrance points on the topic of relational division. — Erwin Brandstetter, Dec 09 '11 at 01:17

doctore · Answer 3 · 2011-12-08T13:39:38.423

1

UPDATE Another possibility:

SELECT t1.id
FROM (SELECT t1.id, t1.value
      FROM table1 t1
      JOIN  table2 t2 USING (value)
      GROUP BY t1.id, t1.value
      ORDER BY t1.id) t1
GROUP BY t1.id      
HAVING COUNT(*) = (SELECT COUNT(*) FROM table2)

The cost of my answer, if you use EXPLAIN ANALYZE is always 893-900, even with repeated rows.

edited Dec 08 '11 at 13:39

answered Dec 08 '11 at 12:10

doctore

3,855
2
29
45

This query might cost lower than Erwin's but it is not a correct solution. If there was a `(7, 111)` row in the OP's example, the output would contain `id` `7`, even though there was no `(7, 112)` and `(7, 116)` rows. – Andriy M Dec 08 '11 at 12:41

score 0 · Answer 4 · answered Aug 31 '16 at 12:59

NOT EXISTS(... NOT EXISTS) is a standard solution to relational division:

SELECT DISTINCT id
FROM table1 t1
WHERE NOT EXISTS (
        SELECT * FROM table2 t2
        WHERE NOT EXISTS (
                SELECT * FROM table1 t1x
                WHERE t1x.value = t2.value
                AND t1x.id = t1.id
                )
        )
        ;

In this case, the DISTINCT is needed because we don't have access to the domain table with ids, only to the junction table t1 referring to it.

SQL Select all rows where subset exists

4 Answers4