I was asked the following question in my interview for the role of a data engineer which left me completely baffled.
Suppose I have the following tables:
Course(CId, Name, Type), Student(SId, Name), Takes(CId, SId, Semester, Grade)
Takes.CId foreign key references Course.CId
Takes.SId foreign key references Student.SId
Where the attributes in bold are the primary keys and a student can take a single course more than once (in case he has not met the requirements to clear).
Now suppose I have the following queries:
Q1
SELECT SId, Name
FROM Student s
WHERE COUNT(SELECT * FROM Takes t WHERE t.SId=s.SId) > 0
Q2
SELECT DISTINCT SId, Name
FROM Student NATURAL JOIN Takes
The question is the following:
What are the situations where Q1 would be more efficient than Q2 and where Q2 would be more efficient than Q1?