2

This should be common enough and I'm looking for the "best" way to perform this in one SQL query (MySQL).

I have three tables, an items table, a linker table and a tags table. Items can be tagged multiple times, so the linker is a simple foreign key linker table:

items   | linker  | tags  
--------+---------+-------
item_id | item_id | tag_id
...     | tag_id  | name  
--------+---------+-------

I can search items for single tags easily, how would I go to search items that have 2 or more specific tags?

SELECT *, `tags`.`name`
FROM `items`
LEFT OUTER JOIN `linker` USING (`item_id`)
LEFT OUTER JOIN `tags` USING (`tag_id`)
WHERE `tags`.`name` = "tag-a"

How does a sane person perform search for 2 or more tags, an item must have ALL the tags, i.e. an AND query?


Edit: What I have so far is the following, which works and doesn't seem to be slow, but looks crazy:

SELECT `items`.* FROM `items`
LEFT OUTER JOIN `linker` USING (`item_id`)
LEFT OUTER JOIN `tags` USING (`tag_id`)
WHERE (
        `item_id` IN (SELECT item_id FROM linker LEFT JOIN tags USING (tag_id) WHERE name = "tag-a")
    AND `item_id` IN (SELECT item_id FROM linker LEFT JOIN tags USING (tag_id) WHERE name = "tag-b")
    AND `item_id` IN (SELECT item_id FROM linker LEFT JOIN tags USING (tag_id) WHERE name = "tag-c")
    AND `item_stuff` = "whatever"
)
Pascal
  • 16,846
  • 4
  • 60
  • 69
  • if each tag is one per entry, couldn't you use `where concat_tags IN ('tag1','tag2')`? Is this what you mean? or look into `RLIKE` which is similar to a `like` search but also searches throughout a string (like regex). – d-_-b Dec 29 '12 at 23:57
  • I just realized the `concat_tags` version doesn't actually work since it's not recognized by the WHERE clause (no such field), I've updated the question. – Pascal Dec 29 '12 at 23:59
  • 1
    Read this question: **[How to filter SQL results in a has-many-through relation](http://stackoverflow.com/questions/7364969/how-to-filter-sql-results-in-a-has-many-through-relation)**. More than 10 ways to achieve what you want (which is called **Relational Division**). For MySQL, I'd use solution No 5 or 6. – ypercubeᵀᴹ Dec 30 '12 at 01:11
  • Ha, perfect, thanks for that question and the name to search for it. Looks like my current approach, posted in the question, matches solution 4) in the linked question, and is one of the fastest performers. (y) – Pascal Dec 30 '12 at 01:16
  • No, for MySQL, any query with `column IN (SELECT ...)` should be avoided (the benchmarks there are for Postgres). – ypercubeᵀᴹ Dec 30 '12 at 12:43

4 Answers4

1

Assuming the PK for the linker table is (item_id,tag_id), I would use the following:

select *
  from items
  where item_id in (
    select item_id
      from linker
      join tags using(tag_id)
     where name in ('tag1', 'tag2', 'tag3')
     group by item_id
     having count(tag_id)=3
  )
;

The above query should be easy to maintain. You can easily add or subtract required tag names. You just need to make sure the having count matches the number of names in the list.

If the linker table PK is not (item_id,tag_id), then the having clause would have to change to having count(distinct tag_id)=3, though that query may not perform so well, depending on how many duplicate (item_id,tag_id) pairs exist.

Another nice feature about the above is you can easily answer questions like, which items are associated with at least 2 of the following list of tags ('tag1','tag2','tag3'). You just need to set the having count to the correct value.

dbenham
  • 127,446
  • 28
  • 251
  • 390
0

If I understand correctly (which I'm not sure I do :) ... ), you want to find results that contain a certain string (like a regular expression search).

you could try the RLIKE function

SELECT *, `tags`.`name`
FROM `items`
LEFT OUTER JOIN `linker` USING (`item_id`)
LEFT OUTER JOIN `tags` USING (`tag_id`)
WHERE `tags`.`name` RLIKE("tag-a"|"tag-b")

I think this is what you mean, but maybe not:

http://dev.mysql.com/doc/refman/5.0/en/regexp.html


Or if each entry has only one tag per entry, what about using IN :

SELECT *, `tags`.`name`
FROM `items`
LEFT OUTER JOIN `linker` USING (`item_id`)
LEFT OUTER JOIN `tags` USING (`tag_id`)
WHERE `tags`.`name` IN ("tag-a","tag-b")

http://dev.mysql.com/doc/refman/5.0/en/comparison-operators.html#function_in


And why not just a basic OR

 WHERE `tags`.`name` = "tag-a" OR `tags`.`name` = "tag-b"

I hope I'm understanding your goal correctly, please let me know if I don't.

edit I mis-read a part of your question...I may not be sane, but hope this doesn't disqualify me :P

d-_-b
  • 21,536
  • 40
  • 150
  • 256
  • 1
    Your second and third example are what I'm looking for, however this only works with `OR`, I need an `AND`. :) – Pascal Dec 30 '12 at 00:10
  • still not sure I understand, but you can use `AND` or `OR` pretty much interchangeably – d-_-b Dec 30 '12 at 00:11
  • 1
    I see your edit, but no, this doesn't work, `tags`.`name` can't be "tag-a" AND "tag-b" at the same time, try it. ;) – Pascal Dec 30 '12 at 00:12
  • wow! thats what I get for not thinking DUHHHHH hahah....`IN` is probably your best bet (if `RLIKE` doesn't suit you). haha I'm still cracking up at the fact that i suggested that lol – d-_-b Dec 30 '12 at 00:14
  • Believe it or not, that's what I did first and wondered why there are no results. ;) The `IN` basically is an OR, so I can't use that, either. – Pascal Dec 30 '12 at 00:18
  • if you dont want `OR` and dont want `AND`, then i don't think i understand the goal here. Do you have two tags in one row? – d-_-b Dec 30 '12 at 00:29
  • One row per tag, and I **do** want `AND`, see my updated question. – Pascal Dec 30 '12 at 00:40
0

To restate your question, you want all columns from table items that have all the tags in some list, is that correct? If so, I think you need to join to your tags table for each and use an INNER JOIN instead of a LEFT OUTER JOIN. Something like this:

SELECT DISTINCT `items`.* 
FROM   `items` a
JOIN   `linker` b 
ON     b.item_id=a.item_id

JOIN   `tags` c1
ON     c1.tag_id=b.tag_id
   and c1.name = "tag-a"

JOIN   `tags` c2
ON     c2.tag_id=b.tag_id
   and c2.name = "tag-a"

JOIN   `tags` c3
ON     c3.tag_id=b.tag_id
   and c3.name = "tag-c"

Using an INNER JOIN will select only rows that have all three tags. I'm not sure how you would do this with a variable number of tags (which I think is what you really want).

BellevueBob
  • 9,498
  • 5
  • 29
  • 56
  • Yes, I want to do this with a variable number of tags, so this is not really elegant. Another user linked a relevant question and it turns out my interim solution is not a bad one, will post that as answer, thanks anyway! – Pascal Dec 30 '12 at 01:17
  • I'd be curious to see the performance difference. Using sub-queries to build lists for `IN` processing is usually inefficient. Again, just curious; I don't use MYSQL myself. – BellevueBob Dec 30 '12 at 01:49
0

Of course this has been asked already: How to filter SQL results in a has-many-through relation

Turns out my interim solution is one of the fastest (number 4 in the linked question), here it is:

SELECT *
FROM `items`
WHERE (
        `item_id` IN (SELECT item_id FROM linker INNER JOIN tags USING (tag_id) WHERE name = "tag-a")
    AND `item_id` IN (SELECT item_id FROM linker INNER JOIN tags USING (tag_id) WHERE name = "tag-b")
    AND `item_id` IN (SELECT item_id FROM linker INNER JOIN tags USING (tag_id) WHERE name = "tag-c")
    AND `item_stuff` = "whatever"
)
Community
  • 1
  • 1
Pascal
  • 16,846
  • 4
  • 60
  • 69
  • There is no reason to be doing an outer join. Logically you want an inner join. The outer and inner join both give the same answer, but the outer join obscures the meaning of the query. Also, I would not rely on the timings in the linked post to choose which is the best approach. The optimum query can change depending on the number of rows in each table, as well as how frequently the tags of interest occur. – dbenham Dec 30 '12 at 04:48
  • @dbenham You mean to use `LEFT INNER JOIN` in the subqueries? The optimum timing was determined on my rig on my data (admittedly not very thoroughly), not just inferred from the linked answer. – Pascal Dec 30 '12 at 15:14
  • 1
    There isn't any such thing as `LEFT INNER JOIN`. `LEFT OUTER JOIN` and `LEFT JOIN` are synonymous. As are `JOIN` and `INNER JOIN` synonymous. Your subquery has no use for the outer join - the outer joined rows get thrown away by your where clause any way, leaving only rows that are retrieved by an inner join. I went back and read Erwin's comments about the various options. He has a good point that the number of tags being looked up can have an impact on performance and optimization. – dbenham Dec 30 '12 at 18:05