3

I have this table

mysql> describe skill_usage;
+----------+---------+------+-----+---------+-------+
| Field    | Type    | Null | Key | Default | Extra |
+----------+---------+------+-----+---------+-------+
| skill_id | int(11) | NO   | MUL | NULL    |       |
| job_id   | int(11) | NO   | MUL | NULL    |       |
+----------+---------+------+-----+---------+-------+

and know that in my data, there is a single job_id (6) which was used for both skill_id 3 and 4:

mysql>  select * from skill_usage;
+----------+--------+
| skill_id | job_id |
+----------+--------+
|        1 |      1 |
|        2 |      2 |
|        3 |      3 |     <----  matches only one part of the AND clause
|        3 |      4 |     <----  matches only one part of the AND clause
|        2 |      5 |
|        3 |      6 |     <==== matches both parts of the AND clause
|        4 |      6 |     <====
|        2 |      7 |
+----------+--------+
8 rows in set (0.00 sec)

Here's what I tried:

SELECT DISTINCT s1.job_id FROM skill_usage AS s1 
  INNER JOIN skill_usage AS s2 ON s1.job_id = s2.job_id
    WHERE s1.skill_id IN (3,4)
    AND   s2.skill_id IN (3,4)

which I thought meant "find all job_id which matches both skill_id 3 and skill_id 4".

Apparently not:

mysql> SELECT DISTINCT s1.job_id FROM skill_usage AS s1
    ->   INNER JOIN skill_usage AS s2 ON s1.job_id = s2.job_id
    ->     WHERE s1.skill_id IN (3,4)
    ->     AND   s2.skill_id IN (3,4);
+--------+
| job_id |
+--------+
|      3 |
|      4 |
|      6 |
+--------+
3 rows in set (0.00 sec)

What am I doing wrongly? How should my query read? I think that it's time for a good book or Udemy course, but none that I own cover self join.

My query is correctly finding job_id = 6, but wrongly (IMO), finding job_id 3 and 4. I would expect them to fail the AND clause.

Eyeslandic
  • 14,553
  • 13
  • 41
  • 54
Mawg says reinstate Monica
  • 38,334
  • 103
  • 306
  • 551
  • 2
    See https://meta.stackoverflow.com/questions/333952/why-should-i-provide-a-minimal-reproducible-example-for-a-very-simple-sql-query – Strawberry Aug 15 '20 at 08:20

2 Answers2

4

You may use aggregation here:

SELECT job_id
FROM skill_usage
WHERE skill_id IN (3, 4)
GROUP BY job_id
HAVING MIN(skill_id) <> MAX(skill_id);

This query should benefit from the following index:

CREATE INDEX idx ON skill_usage (skill_id, job_id);

Both the WHERE and HAVING clauses, as written, are sargable, and should be able to take advantage of this index.

Tim Biegeleisen
  • 502,043
  • 27
  • 286
  • 360
1

Alternative solution.
(Refer db fiddle)

select s1.job_id
  from skill_usage s1
  where s1.skill_id = 3
    and s1.job_id in (
                       select s2.job_id
                         from skill_usage s2
                        where s2.skill_id = 4
                     )
Abra
  • 19,142
  • 7
  • 29
  • 41
  • I am awarding this one the answer, because it seems easier to combine it to build complex queries. E.g "find job_id which has either skill Id 3 AND 4 OR skill_id 1)" becomes `select distinct s1.job_id from skill_usage s1 where s1.skill_id = 3 and s1.job_id in ( select s2.job_id from skill_usage s2 where s2.skill_id = 4 ) or s1.skill_id=1` See fiddle at https://dbfiddle.uk/?rdbms=mysql_8.0&fiddle=6b9a921882953da88078d253adac086b – Mawg says reinstate Monica Aug 15 '20 at 09:42