0

I need to select exam results of students in class 7A, but need to look into another table (student_profile) to identify students in 7A (identify by student_id). I wonder which of the following method will be faster, assume index for student_id is created in both tables:

Method 1:

select * from exam_results r
where exists
 (select 1 
  from student_profile p
  where p.student_id = r.student_id
    and p.class = '7A')

Method 2:

select * from exam_results
where student_id in
 (select student_id
  from student_profile
  where class = '7A')

Thanks in Advance,

Jonathan

Jonathan
  • 11
  • 1
  • 3
  • 1
    On any modern sql engine the query execution plan will be the same. – Jodrell May 20 '13 at 09:16
  • +1 for JW. **[Please refer the link to further clarify your doubts](http://stackoverflow.com/questions/5018284/mysql-in-queries-terribly-slow-with-subquery-but-fast-with-explicit-values)** – Prahalad Gaggar May 20 '13 at 09:18
  • 1
    I'd throw Method 3 into the mix, appropriately index my tables, and then suck it and see. – Strawberry May 20 '13 at 09:24
  • @Jodrell I think first query (with EXISTS) will take less time. plan will be different for these both query – pratik garg May 20 '13 at 09:25
  • @pratikgarg, it depends on your flavour of SQL server but, if its one thats worth using, it won't make a difference. They are all essentially the same query in the end. In this simple case the query engine should discern that and build a plan accordingly. – Jodrell May 20 '13 at 09:28
  • If you use your test database and the three querys to generate plans you can answer the question much more effectively than us. Currently only you know the version of your server, what indexes and schema you have and how the tables are populated. – Jodrell May 20 '13 at 09:34
  • @Jodrell I am using oracle database and I read that if we use EXISTS then it will be fast because EXISTS will not scan full table data. On first occurence of the true condition it will return the true. – pratik garg May 20 '13 at 09:35
  • 1
    @pratikgarg, the optimal way to produce the results is the same for all 3 statements. Have the Oracle developers produced an engine that can find the optimal plan for all 3 statements? All 3 are so simple, I suspect yes. If you suspect not, you should create the differing plans and post an answer. However, I don't know if the OP is using Oracle. – Jodrell May 20 '13 at 09:43
  • Let's say there's 20 years worth of data for 20,000 students distributed over 72 classrooms (1-9)*(A-G) - each of whom having undertaken 5 courses. That's 1/72 * 2m = ca. 30000. By any measure, this is trivial for a modern RDBMS, so the question is entirely academic. – Strawberry May 20 '13 at 10:33

2 Answers2

2

Short answer, it doesn't matter. The query engine will treat them the same.

Personally, I'd consider this syntax.

select
            r.*
    from
            exam_results r
        join
            student_profile p
                on p.student_id = r.student_id
    where
           p.class = '7A'

The inner is implicit if omitted.

You'll get the same performance because modern query engines are well developed but, I think this standard syntax is more extendable and easier to read.


If you extend this query in the future, multiple join conditons will be easier to optimize than multiple exists or ins.

Jodrell
  • 34,946
  • 5
  • 87
  • 124
0

If you compare the two queries, then the one with the EXISTS is faster. However the right (and usually faster) approach to such problems is a JOIN.

select r.student_id, r.other_columns
from exam_results r
inner join student_profiles s
on r.student_id = s.student_id
where s.class = '7A'
nakosspy
  • 3,904
  • 1
  • 26
  • 31
  • I'd give you a +1 for the join method, but whether the exists is faster or not depends, as Jodrell says, on too many factors for there to be a generic answer. – David Aldridge May 20 '13 at 09:54
  • Performance of a query in the database depends on many parameters (indexes, statistics, count of records meeting the criteria and much more). It's a good practice to try to optimize a query by examining the access plan. But before trying that, you start with one form of the query. I think that a developer would start with the join query and then optimize. – nakosspy May 20 '13 at 10:18