11

Suppose I have a "tags" table with two columns: tagid and contentid. Each row represents a tag assigned to a piece of content. I want a query that will give me the contentid of every piece of content which is tagged with tagids 334, 338, and 342.

The "easy" way to do this would be (pseudocode):

select contentid from tags where tagid = 334 and contentid in (
    select contentid from tags where tagid = 338 and contentid in (
        select contentid from tags where tagid = 342
    )
)

However, my gut tells me that there's a better, faster, more extensible way to do this. For example, what if I needed to find the intersection of 12 tags? This could quickly get horrendous. Any ideas?

EDIT: Turns out that this is also covered in this excellent blog post.

Max Cantor
  • 8,229
  • 7
  • 45
  • 59

5 Answers5

25
SELECT contentID
FROM tags
WHERE tagID in (334, 338, 342)
GROUP BY contentID
HAVING COUNT(DISTINCT tagID) = 3


--In general
SELECT contentID
FROM tags
WHERE tagID in (...) --taglist
GROUP BY contentID
HAVING COUNT(DISTINCT tagID) = ... --tagcount
Amy B
  • 108,202
  • 21
  • 135
  • 185
  • That's it. Although if tagid, contentid is unique, then perhaps DISTINCT is not needed in COUNT(DISTINCT tagID) – tzot Oct 02 '08 at 18:44
  • Very smart solution but it would need a tweak on our installation of SQL2000. We can't use "having" without having an aggregation in the select clause. Easy enough to work around though. – Bob Probst Oct 02 '08 at 18:49
  • I only know t-sql Max, maybe someone else knows about other sql flavors? – Amy B Oct 02 '08 at 19:02
  • I haven't tested this in other DBs, but all of the commands used are part of the SQL standard, so they *should* work. – Powerlord Oct 03 '08 at 15:09
2

Here's a solution that has worked much faster than the for me on a very large database of objects and tags. This is an example for a three-tag intersection. It just chains many joins on the object-tag table (objtags) to indicate the same object and stipulates the tag IDs in the WHERE clause:

SELECT w0.objid

FROM       objtags t0
INNER JOIN objtags t1 ON t1.objid=t0.objid
INNER JOIN objtags t2 ON t2.objid=t1.objid

WHERE t0.tagid=512
  AND t1.tagid=256
  AND t2.tagid=128

I have no idea why this runs faster. It was inspired by the search code in the MusicBrainz server. Doing this in Postgres, I usually get a ~8-10x speedup over the HAVING COUNT(...) solution.

adrian
  • 1,447
  • 15
  • 24
  • You're using set intersection instead of aggregation to determine whether all three values apply. Less thrashing. That's gorgeous, I wish I'd thought of it. – Peter Wone Mar 14 '10 at 05:30
1

The only alternative way i can think of is:

select a.contentid from tags a
inner join tags b on a.contentid = b.contentid and b.tagid=334
inner join tags c on a.contentid = c.contentid and c.tagid=342
where a.tagid=338
albertein
  • 26,396
  • 5
  • 54
  • 57
0

I don't know if this is better but it might be more maintainable

select contentid from tags where tagid = 334
intersect
select contentid from tags where tagid = 338
intersect
select contentid from tags where tagid = 342

You'd have to build it dynamically which wouldn't be as bad as your original solution.

Bob Probst
  • 9,533
  • 8
  • 32
  • 41
-1

What type of SQL? MS SQL Server, Oracle, MySQL?

In SQL Server doesn't this equate to:

select contentid from tags where tagid IN (334,338,342)
Meff
  • 5,889
  • 27
  • 36
  • 1
    No, that would give you all the articles in the 3 tags, he wants all the articles which have the same 3 tags – albertein Oct 02 '08 at 18:42