3

I have a many-to-many relationship between students and classes as shown below. I would like to get the IDs of all the students who are not registered for any classes.

many-to-many relationship

Because of the size of the dataset, I'd like to avoid using NOT IN. Is there a more efficient solution?

Yes - that Jake.
  • 16,725
  • 14
  • 70
  • 96
  • On MySQL 5+ `NOT IN` is optimized, so it should not be slow. – Johan Sep 20 '11 at 16:22
  • 1
    @Johan: He's not using MySQL. – Steven Rumbalski Sep 20 '11 at 16:25
  • @Jekke: Did you try using `NOT IN` or are you assuming it would be too slow? – Steven Rumbalski Sep 20 '11 at 16:26
  • @StevenRumbalski, The pictures should have told me that, it's just that `not in` slowness is a persistent MySQL myth. I wonder if the same goes for recent versions of SQL-server. – Johan Sep 20 '11 at 16:28
  • @Johan - I just tested a non correlated `NOT IN` in MySQL and it shows up as a dependant sub query so looks to me that it has [the same issues as `In`](http://stackoverflow.com/questions/3417074/why-would-an-in-condition-be-slower-than-in-sql/3417190#3417190). No? – Martin Smith Sep 20 '11 at 16:42
  • @Martin, I guess yes, but this is a `correlated` not in. So I'm confused about the point you are making. – Johan Sep 20 '11 at 16:51
  • @Johan - The `NOT IN` version is `SELECT studentId FROM student WHERE studentID NOT IN (SELECT StudentID FROM student_class)` AFAIK MySQL will re-evaluate `SELECT StudentID FROM student_class` repeatedly for every row in `student` – Martin Smith Sep 20 '11 at 16:56
  • @MartinSmith: it will reevaluate `SELECT StudentID FROM student_class WHERE student_class.StudentID = student.StudentID` repeatedly, in other words run the same nested loops as `LEFT JOIN` would. Run `EXPLAIN EXTENDED` on this query and see the translated query in the warning. – Quassnoi Sep 21 '11 at 01:32
  • @Quassnoi Looks like I put 2 and 2 together and made 5 there. [Should have just read your post on the matter first!](http://explainextended.com/2009/09/18/not-in-vs-not-exists-vs-left-join-is-null-mysql/) – Martin Smith Sep 21 '11 at 07:39

5 Answers5

8

NOT EXISTS should give you the best performance. See Left outer join vs NOT EXISTS for more details.

SELECT s.StudentID
    FROM student s
    WHERE NOT EXISTS(SELECT NULL
                         FROM student_class sc
                         WHERE sc.StudentID = s.StudentID)
Joe Stefanelli
  • 132,803
  • 19
  • 237
  • 235
  • The query optimizer will create the same plan for this antijoin that it would for one with a left join. – Jeremy Holovacs Sep 20 '11 at 16:27
  • 1
    @Jeremy: Read the article I cited. – Joe Stefanelli Sep 20 '11 at 16:29
  • I raise a skeptical eyebrow towards that article. I'll bet with a little reordering you can get those numbers to invert. – Jeremy Holovacs Sep 20 '11 at 17:03
  • 2
    @Jeremy - [Well here's another one for you](http://explainextended.com/2009/09/15/not-in-vs-not-exists-vs-left-join-is-null-sql-server/). Feel free to add an answer with a counter example that demonstrates a case where `LEFT OUTER JOIN ... NULL` performs better. I'll upvote it! – Martin Smith Sep 20 '11 at 17:10
  • here's a good one: http://sqlserverpedia.com/blog/sql-server-bloggers/cage-match-i-anti-joins/ I'm not saying your answer is wrong, but the differences between the NOT EXISTS and LEFT JOIN are for all practical purposes insignificant. – Jeremy Holovacs Sep 20 '11 at 17:25
  • @Jeremy: that's a bad one. The author semi-joins on the concatenated columns `Title +'--'+ LastName` (which changes the query semantics to begin with) then declares `IN` not efficient. The difference between `NOT EXISTS` and `LEFT JOIN` is significant, they yield different plans, and the `NOT EXISTS` plans are never worse (and in majority of cases better). – Quassnoi Sep 21 '11 at 01:30
  • 1
    @Jeremy The point is that the join has to do more work as it has to do a complete join (match and return columns) while the in and exists just have to do semi-joins (match only). The differences aren't usually huge (unless you do silly things like concatenate columns with the IN (completely prevents index seeks), NOT IN has bad performance characteristics when dealing with NULLs (also in my blog post), NOT EXISTS doesn't. – GilaMonster Sep 21 '11 at 08:21
  • I disagree with none of these things. My objection to the provided answer is the preference for NOT EXISTS vs. LEFT JOIN, based on performance, when the performance difference is statistically insignificant, and the instant you get a bit more complex on your joining conditions or your result set, even that insignificant difference can disappear. For some people, the LEFT JOIN syntax makes more sense and is easier to read, and I do not see a reason to guide ppl away from that based upon performance. – Jeremy Holovacs Sep 21 '11 at 12:32
1
select * from student
left join student_class on student_class.studentid = student.studentid
where student_class.scid is null;
Paul Tomblin
  • 179,021
  • 58
  • 319
  • 408
1

An alternative is to use a left join:

SELECT s.student_id
FROM student s
LEFT JOIN student_class sc ON (sc.student_id = s.student_id)
WHERE sc.student_id IS NULL
Johan
  • 74,508
  • 24
  • 191
  • 319
1

The follwoing join query might result the answer

 SELECT student.id
 FROM student
 LEFT JOIN student_class ON student.studentid = student_class.studentid
 WHERE student_class.studentid IS NULL
Ujjwal Manandhar
  • 2,194
  • 16
  • 20
1

You could also use

SELECT StudentID
FROM student 
EXCEPT
SELECT  StudentID
FROM student_class
Martin Smith
  • 438,706
  • 87
  • 741
  • 845