2

One column in my data is sample, another is category. Duplicate are allowed. Number of unique categories I currently have is 5. Here is a simplified example:

sample category other_columns
122    a
123    a
124    a
125    a
123    b
124    b
125    b
122    c
123    c
124    c
...    ...

I need to select only those samples that exist in all categories (122 is not in 'b' and 125 is not in 'c').

sample category
123    a
124    a
123    b
124    b
123    c
124    c

So, if I run

SELECT category, COUNT(DISTINCT sample, category)
FROM my_table
GROUP BY category

all counts should be the same.

yuk
  • 19,098
  • 13
  • 68
  • 99
  • If the number of categories is always 5, you could add a HAVING count(*) = 5 after your group by clause – Sparky Dec 08 '11 at 19:00
  • can you explain what does duplicate rows refer to? – ajreal Dec 08 '11 at 19:06
  • possible duplicate of [SQL statement - "join" vs "group by and having"](http://stackoverflow.com/questions/477006/sql-statement-join-vs-group-by-and-having) – Bill Karwin Dec 08 '11 at 19:09
  • @ajreal: Duplicates are exist because other columns may have different data. – yuk Dec 08 '11 at 19:38

4 Answers4

3

Try this

select sample,category
from my_table
group by sample,category
having count(*)=5

Actually, the above query is wrong, I misunderstood the question. Try this instead.

select sample,count(distinct category)
from my_table
group by sample
having count(distinct category)>=5
Sparky
  • 14,967
  • 2
  • 31
  • 45
  • I disagree with this, duplicate records are allowed – ajreal Dec 08 '11 at 19:06
  • I don't think this query will work at all. Assuming that the table was unique on `sample, category` and using the OP's example `count(*)` would **always** be 1 – Ben Dec 08 '11 at 19:29
1

This one should do the trick:

SELECT sample 
FROM my_table 
GROUP BY sample
HAVING count(DISTINCT category) >= (SELECT count(DISTINCT category) FROM my_table);
tscho
  • 2,024
  • 15
  • 15
  • Thanks a lot for the answer. Using `having` and counting categories do the trick. I thought which answer to accept, yours or @Sparky's. I choose this one for having addition feature not to rely on the know number of categories. – yuk Dec 09 '11 at 21:59
1

Try thinking the query like this:

"Find me all samples that do not belong to a group of samples that are missing some of the existing categories" -> find all samples that have all the categories.

select m.sample, m.category
from my_table m
where not exists (
  select *
  from my_table m2
  where not exists (
    select *
    from my_table m3
    where m.sample = m3.sample and m2.category = m3.category
  )
)
order by m.category, m.sample;
slaakso
  • 8,331
  • 2
  • 16
  • 27
  • Thank you for taking your answering my question. I'm trying it, but the query takes a lot of time to be completed. – yuk Dec 09 '11 at 22:11
0

You have a table categories ( category, category_description, etc... ) and a foreign key constraint in sample so all your samples must have a category. Then your query is:

select sample
  from ( select distinct sample, category from my_table )
 group by sample
having count(*) = ( select count(*) 
                      from categories )         

This means that you can change the number of categories and it makes no difference. You know exactly what categories each sample has as they're all the ones in categories.

EDIT Taking into account @ajreal's comment...

Ben
  • 51,770
  • 36
  • 127
  • 149
  • Thank you for the answer. I believe it would work if I'd have a separate categories table. I don't have any additional description for categories. – yuk Dec 09 '11 at 22:09