0

I'm currently taking a course and during one of the tests, I came across this question.

The math_students and english_students tables have the following columns:

student_id, grade, first_name, last_name

Using a subquery, find out what grade levels are represented in both the math and english classes.

The query I used was this.

select distinct grade 
from math_students 
where grade in (
select grade 
from english_students
);

However, it was graded as incorrect and the correct answer was given as

SELECT grade
FROM math_students
WHERE EXISTS (
  SELECT grade
  FROM english_students
);

I would really appreciate it if someone could help me understand the difference in the two queries because the output was the same in both cases. Also, why doesn't the query contain distinct?

SQLrookie
  • 11
  • 5
  • `DISTINCT` is not normally necessary for an `EXISTS` because there is only one table here. Unless in any case the outer table must be distinctified, hard to say from the question. Also note that in the case of `NOT IN` [you can get incorrect results](https://stackoverflow.com/questions/173041/not-in-vs-not-exists) so as force of habit, always use `NOT / EXISTS` – Charlieface Feb 09 '21 at 17:39
  • The reason I'm asking is because distinct displays the grades without repeating them which is essentially what they're asking for. The question is "find out what grade levels are represented in both the math and english classes". Without distinct, the result shows way too many repetitions which doesn't exactly serve the purpose. Having said that, I'm very new to SQL so not sure if I've understood it correctly. – SQLrookie Feb 09 '21 at 17:59
  • If that is what they want then, yesy, you would need a distinct. – Charlieface Feb 09 '21 at 18:02
  • Got it. Thank you. – SQLrookie Feb 09 '21 at 18:09
  • It's such a phoney question though – Strawberry Feb 09 '21 at 21:36

2 Answers2

1

The version with EXISTS is incorrect. Period. It is answering the "question":

Return all grades for math students if there is at least one English student.

Not very useful. The correct EXISTS would be:

SELECT DISTINCT grade
FROM math_students ms
WHERE EXISTS (
  SELECT 1
  FROM english_students es
  WHERE ms.grade = es.grade
);

If your database supports it, I would expect you to also be learning:

select grade
from math_students
intersect
select grade
from english_students;
Gordon Linoff
  • 1,242,037
  • 58
  • 646
  • 786
  • Thank you for the response. Again, I'm very new to SQL so if you could explain the EXISTS query mentioned by you a little more in detail, I'd really appreciate it. – SQLrookie Feb 09 '21 at 18:02
  • Also, could you tell me why the code with IN instead of EXISTS is incorrect. – SQLrookie Feb 09 '21 at 18:03
  • @SQLrookie The `EXISTS` ignore whatever is in the inner `SELECT`, so effectively it just says "Is there any rows *at all* in that table" the answer will always be yes. You need a `WHERE` to correlate the inner query to the outer – Charlieface Feb 09 '21 at 18:04
  • Got it. Thank you. – SQLrookie Feb 09 '21 at 18:17
0

The "In" clause checks if the value is a member of a provided list, this can either be a hard coded list or a query result.

In your example if the table english_students was defined as

id name grade
-- ---- -----
1  Joe  6
2  Bill 8
3  Sue  7

The query:

select distinct grade 
from math_students 
where grade in (
select grade 
from english_students
);

Will evaluate to:

select distinct grade 
from math_students 
where grade in (
"6","8","7"
);

Order of operations executes the query in parens first, then executes the outer query so since you are saying

select grade 
from english_students

It would return "6","8","7" then execute the outer query.

Your second query:

SELECT grade
FROM math_students
WHERE EXISTS (
  SELECT grade
  FROM english_students
);

"Exists" checks for just that Existence of data (or objects), so you are saying select all data in math_students if data exists in english_students as Gordon pointed out.

To answer the question in the title. The big distinction between "In" and "Exists" is: "In" is evaluates to "Give me anything that matches one of these values" where "Exists" evaluates to "Give me anything that exists"

The reason the second does not require a distinct is probably data related.

As Gordon pointed out "intersect" will append multiple results to the same result set and return the distinct records.

Brian K. Burge
  • 114
  • 1
  • 10
  • Why are `IN` queries inefficient? A good DBMS such as SQL Server will transform it into a semi-join anyway, the same as an `EXISTS` – Charlieface Feb 09 '21 at 18:03
  • You are correct, showing my age with older DBMS systems. Edited and removed that line – Brian K. Burge Feb 09 '21 at 18:08
  • So the IN query that I wrote was correct and I don't need to force myself to use NOT/EXISTS to avoid error. Is that correct? – SQLrookie Feb 09 '21 at 18:14
  • Not really, there is one problem with it, the requirement is "Using a subquery, find out what grade levels are represented in both the math and english classes." using your in query would be fine but what if students in a grade in the "english_students" table did not exist in the math_students table? the in query would only return data that is contained in the english_students table, so the intersect query Gordon gave would be a better option. – Brian K. Burge Feb 09 '21 at 18:20
  • I'll definitely use intersect going forward but for the sake of answering the test question, an acceptable answer would need to contain a subquery. I see the difference now but somehow, I'm still confused as to why the existence of students in both tables would matter when the query is based on existence of grades alone. Like I said, I'm new so I'm extremely curious and might be missing something here. – SQLrookie Feb 09 '21 at 19:21
  • Think about it like this math table contains the grades "5,10,6,12" and the English table contains grades "6,9,11,5" if you use an in or exists, you would only select the grades that exist in the math table and matching values in the English table, meaning 9,11 and 12 are not represented. If my thought of the requirement is correct they want a set of all the grades from both tables I.e. "5,6,9,10,11,12" the requirement says "find out what grade levels are represented in both the math and english classes." – Brian K. Burge Feb 10 '21 at 21:13
  • It does not mention only matching records. It may be acceptable to only take matching records, but that is a matter you would need to clarify with your instructor. – Brian K. Burge Feb 10 '21 at 21:13