21

How can one select only the items he want in the IN list? for example

select * from pagetags where TagID in (1,2,4)

Now I want all the pages which has all the above 3 IDs assigned to them (1,2,4), not just any of them but all of them?

Is there a way? any other operator? I have already tried = Any and = All but no luck.

Mahmoud Gamal
  • 78,257
  • 17
  • 139
  • 164
Ali
  • 273
  • 1
  • 3
  • 9

4 Answers4

32

The term for this type of problem is relational division. One way below.

SELECT PageID
FROM   pagetags
WHERE  TagID IN ( 1, 2, 4 )
GROUP  BY PageID
HAVING Count(DISTINCT TagID) = 3
Martin Smith
  • 438,706
  • 87
  • 741
  • 845
  • ok, i should have write my original schema because that is more difficult than the example above :) but you saved me there, thanks a ton. – Ali Sep 24 '12 at 11:49
0

you could try something like this:

SELECT id, Tag FROM (
SELECT id, Tag, COUNT(*) OVER(partition by id) as cnt
FROM pagetags
WHERE Tag in(1,2,4)
GROUP BY id, tag
) a WHERE a.cnt = 3
Pete Carter
  • 2,691
  • 3
  • 23
  • 34
0

The selected answer from Martin Smith

SELECT PageID
FROM   pagetags
WHERE  TagID IN ( 1, 2, 4 )
GROUP  BY PageID
HAVING Count(DISTINCT TagID) = 3

is correct but if speed is a problem then try these.

I have a large table doing the same thing and got 10x better performance with the following.
0.2 seconds versus 2.0 seconds for query returning 272 from a table with 3 million rows.
Also tested on a bigger table with 5 tags and same 10x but now 0.5 versus 5.0.
Index is PageID, TagID with millions of PageID and hundreds of TagID.
Common scenario where many objects are tagged multivalue property.

    SELECT distinct(p1.PageID)
    FROM   pagetags p1
    JOIN   pagetags p2
      ON   p2.PageID = p1.PageID
      AND  p2.TagID = 2 
    JOIN   pagetags p3
      ON   p3.PageID = p1.PageID
      AND  p3.TagID = 4
    WHERE  p1.PageID = 1
 ORDER BY  p1.PageID

or

   SELECT  distinct(PageID)
    FROM   pagetags
    WHERE  TagID = 1
   INTERSECT
   SELECT  distinct(PageID)
    FROM   pagetags
    WHERE  TagID = 2 
   INTERSECT
   SELECT  distinct(PageID)
    FROM   pagetags
    WHERE  TagID = 4 
 ORDER BY  PageID

Prefer the last as with over 5 joins the query optimizer often will make some bad decisions.
And with this have not used up the Group By if you need it for another aggregation.

paparazzo
  • 44,497
  • 23
  • 105
  • 176
  • It depends on the indexes available and the number of tags in the system. Basically the same as comparing [`PIVOT` vs `JOIN`](http://stackoverflow.com/questions/7448453/sql-server-pivot-vs-multiple-join/7449213#7449213). If the number of tags to look at is variable this is not very extendable except if the query is generated via dynamic SQL. – Martin Smith Sep 24 '12 at 12:55
  • @MartinSmith In my case the index is PageID TagID. Less than 1000 TagID but can have millions of PageID. Used by a program. So I gladly take the 10x performance. – paparazzo Sep 24 '12 at 13:24
0
SELECT distinct(PageID)
FROM   pagetags
WHERE  TagID IN (1,2,4)
and PageID in
(select distinct(PageID) from pagetags group by PageID having count(TagID)=3)
group by PageID
ofer
  • 1