4

if you have parent table

create table parent (
  pid int not null,
  name varchar(255)
)

and a parent-child join table

create table parent_child (
  pid int not null,
  cid int not null,
  foreign key (pid) references parent(pid),
  foreign key (cid) references child(cid)
)
create table child(
  cid int not null,
  name varchar(255)
)

How can I find all parent's names where all their children have names in the following list ('dave','henry','myriam','jill').

I don't want to see a parent if they have a child with a different name, but if they have 1 or more children and all their children have names in the list I want to see the parent's name.

I did find this https://stackoverflow.com/a/304314/1916621 that will help me find a parent with children of the exactly those names, but I can't figure out how to the parents who only have children with names in a subset of that list.

Extra points if someone knows performance tradeoff for different approaches.

Community
  • 1
  • 1
e...
  • 53
  • 4

2 Answers2

3
SELECT 
    p.pid, 
    p.name
FROM 
    parent p
WHERE NOT EXISTS (
    SELECT *
    FROM 
        parent_child pc 
        JOIN child c 
            ON pc.cid = c.cid
            AND c.name NOT IN ('dave','henry','myriam','jill')
    WHERE 
        p.pid = pc.pid
) AND EXISTS (
    SELECT *
    FROM 
        parent_child pc 
        JOIN child c 
            ON pc.cid = c.cid
            AND c.name IN ('dave','henry','myriam','jill')
    WHERE 
        p.pid = pc.pid
)

Another method... no sub-queries, but additional DISTINCT needed to eliminate duplication of parent records from joining to the parent_child table.

SELECT DISTINCT
    p.pid, 
    p.name
FROM 
    parent p 
    JOIN parent_child pc_exists ON pc_exists.pid = p.pid
    JOIN child c_exists 
        ON c_exists.cid = pc_exists.cid
        AND c_exists.name IN ('dave','henry','myriam','jill')
    LEFT JOIN parent_child pc_notExists ON pc_notExists.pid = p.pid
    LEFT JOIN child c_notExists 
        ON c_notExists.cid = pc_notExists.cid
        AND c_notExists.name NOT IN ('dave','henry','myriam','jill')
WHERE
    c_notExists.cid IS NULL
Michael Fredrickson
  • 36,839
  • 5
  • 92
  • 109
  • I think is the first correct solution... but if there is a large db with many different names would the select in the "where not exists" take too long? Wonder if it would be possible w/o sub queries. – e... Dec 19 '12 at 21:56
  • 1
    @user1916621 Let me add an example without sub-queries, but I'll bet that both queries will either use the same execution plan, or that the `not exists / exists` query posted here will be faster... since I'll have to add an additional `distinct` operator to the other method. The best way to know which way is faster is to try both. – Michael Fredrickson Dec 19 '12 at 21:58
1

Here is my moderate bet:

Sample tables:

Parent

PID     NAME
1       dad john
2       mum sandy
3       dad frank
4       mum kate
5       mum jean

Child

CID     NAME
11      dave
22      maryam
33      henry
44      maryam
16      jill
17      lina
23      jack
34      jill
55      dave

Parent_Child

PID     CID
1       11
1       16
1       17
2       22
3       33
4       44
2       23
5       55
3       34

Query:

select p.pid, p.name, 
group_concat(c.name) as children
from parent as p
inner join parent_child as pc
on p.pid = pc.pid
join child as c
on pc.cid = c.cid
where c.name
in ('dave','henry','maryam','jill')
group by p.pid
;

Results:

PID     NAME        CHILDREN
1       dad john    dave,jill
2       mum sandy   maryam
3       dad frank   henry,jill
4       mum kate    maryam
5       mum jean    dave

Using REGEXP and GROUP_CONCAT

It is something much better than in or find_in_set for SQL. The change I did, I used the list as a comma delimitted string ;)

*But the issue here: that group_concat string order has to be found in the comman delimitted string.* Unless we make the REGEXP much efficient :)

Query:

select x.pid, x.name,
x.children from(
select p.pid, p.name, 
group_concat(c.name) as children,
  count(c.name) as counts
from parent as p
inner join parent_child as pc
on p.pid = pc.pid
join child as c
on pc.cid = c.cid
group by p.pid) as x
where 'dave,maryam,henry,jill'
REGEXP x.children
;

Results:

PID     NAME        CHILDREN
3       dad frank   henry,jill
4       mum kate    maryam
5       mum jean    dave

*SQLFIDDLE

bonCodigo
  • 14,268
  • 1
  • 48
  • 91
  • +1 For going through the effort of creating the Sql Fiddle, but I think `PID 1` should be excluded because this parent also has a child that's not in the list (`lina`).... – Michael Fredrickson Dec 19 '12 at 22:11
  • @MichaelFredrickson precisely OP wants the parents whose one child ore more children (=all children of one parent) in the list. Tricky `intersect` type of a query, I salute you :) +1 – bonCodigo Dec 19 '12 at 22:13
  • @MichaelFredrickson I just want to get your thoughts on the update I did with `REGEXP`. Do you think with a better regex pattern this could be effective? – bonCodigo Dec 19 '12 at 23:15
  • It looks like its getting the correct results (+1)... as far as efficiency, I'm not sure - probably depends pretty heavily on the data involved. The `REGEXP` will obfuscate any indexes, so that will probably have a bigger performance hit than would be saved / spent by tweaking the regular expression... I'm not much of a regex guy, so unfortunately I can't recommend a more efficient pattern to try... if you end up comparing this approach vs one of the other solutions, I'd be very curious to know how it compares. – Michael Fredrickson Dec 20 '12 at 22:47
  • @MichaelFredrickson thanks for sharing the info. Well I tested out both your queries. Sorry to haunt you on this. Something weird in the way I am executing perhaps, doesn't seem to result in what we need. [SQLFIDDLE](http://sqlfiddle.com/#!2/8b12f/35) – bonCodigo Dec 21 '12 at 10:38