In Difference between EXISTS and IN in SQL? it's clear that lots of people think "exists is faster if the subquery returns many rows, in is faster if it returns a few rows" but I'd like to know a bit more detail about the actual implementation of each of these keywords and how it could give rise to a difference in performance based on result set size
Lots of people were claiming that EXISTS is simply looking for a true or false (as though IN isn't??) and gives up as soon as a true is found, whereas IN will "scan the entire set" - really? If I were writing a dbms and using a naive loop to establish the truth of 3 IN (1,2,3,4,5)
I'm pretty sure I'd code it up such that if I found 3 after a checking 1 and 2, I wouldn't check 4 and 5 to see if they were also 3
I've a strong suspicion that, for the most part, IN, EXISTS and even JOIN are implemented identically/the query optimisers of modern DBs rewrite/approach these different keywords in the same way anyway - is there a modern database out there that does still exhibit an appreciable performance difference with EXISTS vs IN and what is the actual implementation detail that causes the difference?
Side note; in that question, some references to Oracle 8/SQL Server 2000 were made and the strategies they adopted, and I can see how a performance difference would arise with them, but I don't think these can be classed as modern databases unless the latest iterations of these products still implement these keywords the same as they did 20 years ago...