What are the situations where Q1 would be more efficient than Q2 and vice versa?

Question

I was asked the following question in my interview for the role of a data engineer which left me completely baffled.

Suppose I have the following tables:

Course(CId, Name, Type), Student(SId, Name), Takes(CId, SId, Semester, Grade)

Takes.CId foreign key references Course.CId

Takes.SId foreign key references Student.SId

Where the attributes in bold are the primary keys and a student can take a single course more than once (in case he has not met the requirements to clear).

Now suppose I have the following queries:

Q1

SELECT SId, Name
FROM Student s
WHERE COUNT(SELECT * FROM Takes t WHERE t.SId=s.SId) > 0

Q2

SELECT DISTINCT SId, Name
FROM Student NATURAL JOIN Takes

The question is the following:

What are the situations where Q1 would be more efficient than Q2 and where Q2 would be more efficient than Q1?

This wasn't an RDBMS specific question but more of a conceptual query performance and optimisation one. — Jash Shah, Nov 16 '22 at 08:51
Does this [question](https://stackoverflow.com/questions/2577174/join-vs-sub-query) help? — StonedTensor, Nov 16 '22 at 08:58
@StonedTensor can you point me to which answer do you think will help the most? — Jash Shah, Nov 16 '22 at 10:22
The comments in this [answer](https://stackoverflow.com/a/2577224/6023918) in particular have a query similar to yours — StonedTensor, Nov 16 '22 at 10:53

score 1 · Accepted Answer · answered Nov 17 '22 at 03:23

1

Q1 is never efficient! It uses 2 separate queries without joins and requires double resources.

Q2 is better but not always efficient. A NATURAL JOIN will join tables using names of columns that have same names, if there are relationships between the this columns, the join will use index and will work efficient.

The most efficient query is a query where we deliberately use INNER JOIN between columns that have relationships between them. When joins are written between columns that we know have a relation between them, the index created by the relation will be used, which will use the least resources and therefore be the most efficient.

answered Nov 17 '22 at 03:23

safineh

80
9

1

Can you explain more by what you mean by "the join will use index and will work efficient"? – Jash Shah Nov 17 '22 at 07:58
Something like a book index. When you want to access a data from a book, there are 2 ways: 1. Search from the first page to find what you want (which takes a long time) 2. Check the book index to find the find page number and go to it quickly. When data is written in the book, a, some data must write in correct place the `index` too. This makes it take more time to write something in the book (once). But it makes it less time consuming when you want to refer to something in the book. (repeatedly) – safineh Nov 18 '22 at 21:12
The `index` in database is the same as the book index. Databases can create multiple indexes for each column of each table. – safineh Nov 18 '22 at 21:12
This is the document of `CREATE INDEX` command in MySQL: https://dev.mysql.com/doc/refman/8.0/en/create-index.html, In other SQL-based (relational) databases there may be slight variations. But they are usually same. – safineh Nov 18 '22 at 21:21

What are the situations where Q1 would be more efficient than Q2 and vice versa?

1 Answers1