2

Editted heavily!

The original question was based on a misunderstanding of how IN() treats a column from a results set from a join. I thought IN( some_join.some_column ) would treat a results column as a list and loop through each row in place. It turns out it only looks at the first row.

So, the adapted question: Is there anything in MySQL that can loop through a column of results from a join from a WHERE clause?

Here's the super-simplified code I'm working with, stripped down from a complex crm search function. The left join and general idea are relics from that query. So for this query, it has to be an exclusive search - finding people with ALL specified tags, not just any.

First the DB

Table 1: Person

+----+------+
| id | name |
+----+------+
|  1 | Bob  |
|  2 | Jill |
+----+------+

Table 2: Tag

+-----------+--------+
| person_id | tag_id |
+-----------+--------+
|         1 |      1 |
|         1 |      2 |
|         2 |      2 |
|         2 |      3 |
+-----------+--------+

Nice and simple. So, naturally:

SELECT name, GROUP_CONCAT(tag.tag_id) FROM person LEFT JOIN tag ON person.id = tag.person_id GROUP BY name;
+------+--------------------------+
| name | GROUP_CONCAT(tag.tag_id) |
+------+--------------------------+
| Bob  | 1,2                      |
| Jill | 2,3                      |
+------+--------------------------+

So far so good. So what I'm looking for is something that would find only Bob in the first case and only Jill in the second - without using HAVING COUNT(DISTINCT ...) because that doesn't work in the broader query (there's a seperate tags inheritance cache and a ton of other stuff).

Here's my original sample queries - based on the false idea that IN() would loop through all rows at once.

SELECT DISTINCT name FROM person LEFT JOIN tag ON person.id = tag.person_id 
  WHERE ( ( 1 IN (tag.tag_id) ) AND ( 2 IN (tag.tag_id) ) );                            
Empty set (0.00 sec)

SELECT DISTINCT name FROM person LEFT JOIN tag ON person.id = tag.person_id 
  WHERE ( ( 2 IN (tag.tag_id) ) AND ( 3 IN (tag.tag_id) ) );
Empty set (0.00 sec)

Here's my new latest failed attempt to give an idea of what I'm aiming for...

SELECT name, GROUP_CONCAT(tag.tag_id) FROM person LEFT JOIN tag ON person.id = tag.person_id 
  GROUP BY person.id HAVING ( ( 1 IN (GROUP_CONCAT(tag.tag_id) ) ) ) AND ( 2 IN (GROUP_CONCAT(tag.tag_id)) );
Empty set (0.00 sec)

So it seems it's taking a GROUP_CONCAT string, of either 1,2 or 2,3, and is treating it as a single entity rather than an expression list. Is there any way to turn a grouped column into an expression list that IN () or =ANY() will treat as a list?

Essentially, I'm trying to make IN() loop iteratively over something that resembles an array or a dynamic expression list, which contains all the rows of data that come from a join.

user56reinstatemonica8
  • 32,576
  • 21
  • 101
  • 125
  • So a little more testing and it seems these result are identical if I use = instead of IN. So, maybe, the MySQL optimiser is translating my 'IN' into an '=', is seeing that this gives "P = X AND P = Y", and is therefore returning an empty results set barely touching the database - hence the super fast response times <0.01 seconds? – user56reinstatemonica8 Jan 10 '11 at 16:44
  • Superfast response times are probably due to the WHERE condition that can never be true, thus no rows need to be read. See Joe's answer. – Álvaro González Jan 10 '11 at 16:50

2 Answers2

4

Think about what your code is doing logically:

( 1 IN (tag.tag_id) ) AND ( 2 IN (tag.tag_id) )

is equivalent to

( 1 = (tag.tag_id) ) AND (2 = (tag.tag_id) )

There's no way tag.tag_id can satisfy both conditions at the same time, so the AND is never true.

It looks like the OR version you cited in your question is the one you really want:

SELECT DISTINCT name FROM person LEFT JOIN tag ON person.id = tag.person_id 
  WHERE ( ( 1 IN (tag.tag_id) ) OR ( 2 IN (tag.tag_id) ) );   

Using the IN clause more appropriately, you could write that as:

SELECT DISTINCT name FROM person LEFT JOIN tag ON person.id = tag.person_id 
  WHERE tag.tag_id in (1,2);   

One final note, because you're referencing a column from the LEFT JOINed table in your WHERE clause (tag.tag_id), you're really forcing that to behave like an INNER JOIN. To truly get a LEFT JOIN, you'd need to move the criteria out of the WHERE and make it part of the JOIN conditions instead:

SELECT DISTINCT name FROM person LEFT JOIN tag ON person.id = tag.person_id 
  AND tag.tag_id in (1,2);   
Joe Stefanelli
  • 132,803
  • 19
  • 237
  • 235
  • Yup, sounds like I misunderstood how IN () works on a results set - I thought it would treat the column as a set it loops through, not as a single row. So, are there any MySQL functions that work how I expected IN to work? Could I pipe in the result of a GROUP_CONCAT for example? – user56reinstatemonica8 Jan 10 '11 at 16:57
  • @user568458: Did you try the final version of the query in my answer? – Joe Stefanelli Jan 10 '11 at 16:59
  • Thanks for the extra details - unfortunately I really do need AND e.g. to get Bob on 1 and 2, not Bob and Jill. Different types of search. Also for the larger query, this part of the query needs to sit next to an OR in a chain of ANDs - in short, it needs to loop in place. Is there anything capable of looping through a column in one WHERE sub clause like this - such as IN on a GROUP_CONCAT or something made using LOOP and CASE logic? – user56reinstatemonica8 Jan 10 '11 at 17:07
  • I've moved that side of the question over to http://stackoverflow.com/questions/4078838/mysql-array-data-type-split-string - on reflection, it is a separate question. – user56reinstatemonica8 Jan 10 '11 at 18:47
0
 WHERE ( ( 1 IN (tag.tag_id) ) AND ( 2 IN (tag.tag_id) ) );     

This will never return any results since tag.tag_id cannot be 1 and 2 at the same time.

Additionally is there a reason you're using 1 IN (blah) rather than blah = 1?

Jim
  • 22,354
  • 6
  • 52
  • 80
  • What I'm trying to do is loop through the results column in tag.tag_id. One row of tag.tag_id can't be 1 and 2 - but my understanding was that IN looped through all results and returned true if any matches were found. Sounds like I was wrong - so if IN () and = are essentially the same here, how can I loop through my join results column? – user56reinstatemonica8 Jan 10 '11 at 16:52
  • Your where clause is just a row filter on the data, if you want to filter a given set of tag_id's, use `tag_id IN (1,2)` Which is similar to `tag_id=1 OR tag_id=2 `. IN matches any of the values you give it in the list, e.g. `tag_id in (1,2,3)` matches tag id if it's 1 , 2 or 3. But that's very different from saying `tag_id in (1) AND tag_id in (2) AND tag_id in (3)`. Turning the clause around as `1 IN (tag_id)` doesn't make it match tag_id in all the rows against 1, it's still a row filter. – nos Jan 10 '11 at 17:13