4

MySQL 5.5

parent table:
id | facts
child table:
parent_id | foreign_key | facts

Now, I want to find parents that have a certain exact set of children, no more, no less. Something like:

SELECT t1.`id` 
from `parent_table` t1 
  LEFT JOIN `child_table` t2 ON t1.id=t2.parent_id
WHERE t2.`fk` = 1 
  AND t2.`fk` = 3  
  AND t2.`fk` = 5 
  AND t2.`fk` = 7 
  AND t2.`fk` = 9

But this will also get a parent record with this set of children: 1,2,3,5,7,9. And I only want those parents that have the exact set of children: 1,3,5,7,9.

Is there a way?

EDIT: child.parent_id and child.fk are both not unique. child.fk is a foreign key linking to another table. ("many-to-many relationship") So it is quite possible for a parent to have children 1,2,3,5,7,9. My whole reason for doing this query is to try to avoid creating a new parent for 1,3,5,7,9 if such a parent already exists.

Buttle Butkus
  • 9,206
  • 13
  • 79
  • 120

5 Answers5

4

Assuming that child.id is unique for every child.parent_id.

SELECT  a.id, a.facts
FROM    parent a
        INNER JOIN child b
            ON a.id = b.parent_ID
WHERE   b.id IN (1,3,5,7,9) AND        -- <<== list all ChildID here
        EXISTS                         -- <<== this part checks if the parent_ID
        (                              --           present on the EXISTS clause
            SELECT  parent_ID          --           which only filters parents
            FROM    child c            --           with 5 children
            WHERE   b.parent_ID = c.parent_ID
            GROUP   BY parent_ID
            HAVING  COUNT(*) = 5       -- <<== total number of children
        )
GROUP   BY a.id, a.facts
HAVING  COUNT(*) = 5                   -- <<== total number of children
John Woo
  • 258,903
  • 69
  • 498
  • 492
  • I think relying on the count by itself would be unreliable. eggyal's solution looks more robust because it also checks for NOT values. – Buttle Butkus May 19 '13 at 07:19
  • why do you think `COUNT(*)` is not reliable? do I need to check for `NOT` values? try running the script and see the result without checking for `NOT` values. – John Woo May 19 '13 at 07:21
  • It doesn't seem to work. Here check this fiddle that I updated to use your query with the updated schema: http://www.sqlfiddle.com/#!2/d8e2a/5 – Buttle Butkus May 19 '13 at 07:51
  • you are querying on the wrong column. it should be `b.fk` not `b.parent_ID`. see here: http://www.sqlfiddle.com/#!2/d8e2a/11 – John Woo May 19 '13 at 07:54
1
SELECT   parent_id
FROM     child_table
GROUP BY parent_id
HAVING   SUM(id IN (1,3,5,7,9)) = COUNT(*)
     AND COUNT(DISTINCT id) = 5
eggyal
  • 122,705
  • 18
  • 212
  • 237
  • I revised my question a bit. The `child` table is a join table and so rather than `id` it should have been `fk`. `fk` can of course have duplicates. Sorry if that was confusing. – Buttle Butkus May 19 '13 at 07:27
  • @ButtleButkus what we meant is a compound unique column. `UNIQUE(parent_ID, child_ID)` and not `UNIQUE(Parent_ID) and UNIQUE(Child_ID)`. both are different. – John Woo May 19 '13 at 07:29
  • What I want is to be able to have every set of children have a unique parent. I don't want to have 2 parents with the same exact set of children. So parent_id = 1 could have children = 1, 3, 5. And parent_id = 2 could have 1,3,5,6 or 1,2,3,5 or whatever. I'm trying to enforce "set uniqueness". – Buttle Butkus May 19 '13 at 07:29
  • @JW웃 yes UNIQUE(parent_ID, child_ID) is the way it will be. – Buttle Butkus May 19 '13 at 07:30
  • @eggyal I don't think it's quite working. Check out this fiddle: http://www.sqlfiddle.com/#!2/7496a/4 If you add a "2" into the `IN` section, it returns 1 and 2. It should only return 1 in that case. – Buttle Butkus May 19 '13 at 07:38
  • Yes, it needs `HAVING SUM(id IN (1,3,5,7,9)) = 5` where `=5` is the number of the ids you are counting. So, with 2, it should be: `HAVING SUM(id IN (1,2,3,5,7,9)) = 6` – ypercubeᵀᴹ May 19 '13 at 07:41
  • I don't understand how summing an `IN` statement would work. See this fiddle: http://www.sqlfiddle.com/#!2/d8e2a/10 It returns 1 and 2 but it should only return 1 – Buttle Butkus May 19 '13 at 07:55
  • @ButtleButkus: Ugh. My bad. You need the `COUNT(DISTINCT id) = 5` after all. See my edit and [sqlfiddle](http://www.sqlfiddle.com/#!2/d8e2a/13/0). "*Summing an `IN` statement*" works in MySQL (which doesn't have true boolean types) because the result of the `IN` is either 1 if true or 0 if false. – eggyal May 19 '13 at 07:57
1

This problem is called (exact) relational division. There is a lot of useful code and explanation in this article: Divided We Stand: The SQL of Relational Division.

One way to solve it:

SELECT p.id AS parent_id
FROM parent AS p
WHERE EXISTS
      ( SELECT * FROM child AS c
        WHERE c.fk = 1 AND c.parent_id = p.id)
  AND EXISTS
      ( SELECT * FROM child AS c
        WHERE c.fk = 3 AND c.parent_id = p.id)
  AND EXISTS
      ( SELECT * FROM child AS c
        WHERE c.fk = 5 AND c.parent_id = p.id)
  AND EXISTS
      ( SELECT * FROM child AS c
        WHERE c.fk = 7 AND c.parent_id = p.id)
  AND EXISTS
      ( SELECT * FROM child AS c
        WHERE c.fk = 9 AND c.parent_id = p.id)
  AND NOT EXISTS
      ( SELECT * FROM child AS c
        WHERE c.fk NOT IN (1,3,5,7,9) AND c.parent_id = p.id) ;

And another link to a similar question, here at StackOverflow, where you'll find more than 10 different solutions (note: it's not for the exact division but for the division with remainder) and performance tests (for Postgres): How to filter SQL results in a has-many-through relation

Community
  • 1
  • 1
ypercubeᵀᴹ
  • 113,259
  • 19
  • 174
  • 235
  • You didn't define table alias "c" – Buttle Butkus May 19 '13 at 07:33
  • Oh you did indeed. This works but I'm hoping maybe something more like eggyal's solution will work, since it's much shorter and doesn't involve potentially dozens of subqueries. – Buttle Butkus May 19 '13 at 07:40
  • There are many solutions similar to eggyal's. Don't trust though that "shorter=more efficient". Test to be sure. – ypercubeᵀᴹ May 19 '13 at 07:50
  • Thanks for the link. It looks interesting. I'll probably need to sleep on this one. – Buttle Butkus May 19 '13 at 07:52
  • hi @ypercube. this is off-topic but maybe you have an idea. I was answering last time but Id eleted it because it was massively downvoted because of using `LEFT JOIN...IS NULL`. Why is `NOT EXISTS` more efficient than `LEFT JOIN..IS NULL`? I did some test but all of them performs the same. – John Woo May 19 '13 at 08:06
  • @JW웃 I guess you mean `NOT EXISTS` vs `LEFT JOIN ... IS NULL`. They usually have similar efficiency in MySQL, as far as I know. – ypercubeᵀᴹ May 19 '13 at 08:08
1

Similar to eggyal's solution, but just thought I'd throw it in as an alternative since it should be more portable across RDBMS's;

SELECT c.parent_id
FROM child_table c
GROUP BY c.parent_id
HAVING SUM(CASE WHEN c.id IN (1,3,5,7,9) THEN 1 ELSE -1 END) = 5

5 being the exact count of children in the IN clause you want matched (in this case all)

This will only work with distinct children, if there are duplicates, it will break.

An SQLfiddle to test with.

Joachim Isaksson
  • 176,943
  • 25
  • 281
  • 294
  • MySQL does not have `EXCEPT` (only Postgres, DB2 and SQL-Server do, and Oracle as `MINUS`.) – ypercubeᵀᴹ May 19 '13 at 08:29
  • @ypercube *doh* Tried switching between databases and ended up testing on SQL server :) Removing until I fixed it... – Joachim Isaksson May 19 '13 at 08:31
  • @ypercube: Oracle has it as well (it's called `MINUS` there) and so does DB2 (and several others, e.g. Derby, H2,...) –  May 19 '13 at 08:33
  • @a_horse_with_no_name Yep, thnx, I missed DB2. Which makes almost all DBMS (well, the most common ones) have `EXCEPT`, except MySQL :) – ypercubeᵀᴹ May 19 '13 at 08:34
  • @ypercube: that's the world of MySQL - stuck in the 90s with their SQL features. –  May 19 '13 at 08:42
0

I just had to solve a more general case of this problem, but in SQL server. The principles are probably similar though.

SetX
  |-- Child1
  |-- Child2
  |-- Child4

SetY
  |-- Child1
  |-- Child3

ParentA -- has the children defined by SetX
  |-- Child1
  |-- Child2
  |-- Child4

ParentB -- has the children defined by SetY
  |-- Child1
  |-- Child3

ParentC -- does not match any of the sets
  |-- Child1
  |-- Child2
  |-- Child3
  |-- Child4

M problem was around the users of a system (the parents), which roles they were assigned to within the system (the children), and which job description would fit the user (the sets).

The way I solved it was using a bit mask. Each child is assigned a unique 2^n bitmask. Membership of the set is then simply that the user's bitmask sum equals the bitmask sum of the set.

When there are lots of children and the bitmask is in danger of overflowing, you can use bigint bitmasks or multiple bitmasks (making sure to set the lower-order bitmasks to zero).

Here's an example written in T-SQL - pretty sure it would be simple to translate into MySQL (and I'm happy if someone wants to do that in their own answer).

declare @users table (
    name varchar(10)
)

declare @skills table (
    name varchar(20)
    , id int identity (0, 1)
    , bitmask bigint
)

declare @usersWithSkills table (
    userName varchar(10)
    , skillName varchar(20)
)

declare @groups table (
    name varchar(20)
    , bitmask bigint
)

declare @skillsInGroups table (
    groupName varchar(10)
    , skillName varchar(20)
)

insert  @users (name)
values  ('Pat')
    , ('Oprah')
    , ('Millie')
    , ('Bert')

insert  @skills (name)
values  ('Latin')
    , ('Icelandic')
    , ('Physics')

insert  @groups (name)
values  ('polyglot')
    , ('modern')
    , ('omniscient')

insert  @skillsInGroups (groupName, skillName)
values  ('polyglot', 'Latin')
    , ('polyglot', 'Icelandic')
    , ('modern', 'Physics')
    , ('modern', 'Icelandic')
    , ('omniscient', 'Latin')
    , ('omniscient', 'Icelandic')
    , ('omniscient', 'Physics')

insert  @usersWithSkills (userName, skillName)
values ('Pat', 'Latin')
    , ('Pat', 'Icelandic')
    , ('Oprah', 'Latin')
    , ('Oprah', 'Icelandic')
    , ('Oprah', 'Physics')
    , ('Millie', 'Icelandic')
    , ('Millie', 'Physics')
    , ('Bert', 'Latin')

-- give each skill a bitmask value
update  @skills
set bitmask = power(2, id)

-- set the total bitmask values for each group
update  g1
set g1.bitmask = t.sum_ind
from    @groups g1
    inner join (
        select  g.name, sum_ind = sum(r.bitmask)
        from    @groups g
            inner join @skillsInGroups rg
                on rg.groupName = g.name
            inner join @skills r
                on r.name = rg.skillName
        group   by g.name
    ) t
        on t.name = g1.name

select  u1.userName, groupName = g.name
from    (
        select  userName = u.name
            , bitmask_total = sum(r.bitmask)
        from    @users u
            inner join @usersWithSkills uir
                on uir.userName = u.name
            inner join @skills r
                on r.name = uir.skillName
        group   by u.name
    ) u1
    left join @groups g
        on g.bitmask = u1.bitmask_total

The results I get from this are

userName   groupName
---------- --------------------
Bert       NULL
Millie     modern
Oprah      omniscient
Pat        polyglot

(4 rows affected)
OutstandingBill
  • 2,614
  • 26
  • 38