1

I need to optimise the following query, can anyone help? I know it's the Not Exists part that is causing the problem as it is doing a massive table scan, but I'm new to this, can anyone give any advice?

select count(*)
from Job j
where company = 'A'
and branch = 'Branch123'
and engineerNumber = '000123'
and ID > 60473
and not exists(
select JobNumber, Company, Branch
from OutboundEvents o
where o.JobNumber = j.JobNumber
    and o.branch = j.branch
    and o.company = j.company
    and o.Formtype = 'CompleteJob')
Louise
  • 107
  • 1
  • 12
  • Can you share the the schema of job and outbound events (including indexes), and post an actual execution plan (.sqlplan) somewhere that is easy for us to download? – Aaron Bertrand Sep 26 '12 at 13:17

2 Answers2

7
create index [<indexname>] on [Job] (
    [company], [branch], [engineerNumber], [ID]) include ([JobNumber]);
create index [<indexname>] on [OutboundEvents] (
    [company], [branch], [JobNumber], [Formtype]);

Is not queries you optimize, is the data model you optimize. Start by reading Designing Indexes.

Remus Rusanu
  • 288,378
  • 40
  • 442
  • 569
  • 1
    Would there be any difference if `[JobNumber]` was included in the first index as well? – ypercubeᵀᴹ Sep 26 '12 at 13:23
  • @ypercube: yes, it should be, I didn't notice that – Remus Rusanu Sep 26 '12 at 13:26
  • 3
    Just be aware that adding these indexes to optimize this specific query does not take the rest of your workload into account. If your write:read ratio is high you may pay a high price for maintaining these indexes. I'm not saying that makes this answer wrong, just that you need to take the whole system into account, not just one query. This is why we don't blindly follow the Database Engine Tuning Advisor's advice (or an execution plan's, or the missing index DMVs) that tell us to create indexes based on limited and isolated info. – Aaron Bertrand Sep 26 '12 at 13:28
  • 4
    Agree: other queries in the workload, selectivity of the columns mentioned, read/write ratio all play a role. The gist of it is that the solution is not in the *text* of the query (it seldom is) but is in designing a proper data model. I just don't like giving the omnipresent 'it depends' answer. – Remus Rusanu Sep 26 '12 at 13:44
  • Yes, fair enough. I wasn't trying to criticize the answer, just complement it. It is definitely much more useful than "it depends" (or "here, re-write it the way I like"). – Aaron Bertrand Sep 26 '12 at 13:46
0

Thank you all for your helpful insight. I have much to learn :) I managed to cut the execution time down from 1min7sec to just under 1 second using this query:

select count(*)
from job
where company = 'A'
and branch = 'Branch123'
and EngineerNumber = '000123'
AND id> 60473
AND JobNumber not in(
    select Jobnumber from outboundevents b
    where b.company = 'A'
    AND b.Branch = 'Branch123'  
    and b.Formtype = 'CompleteJob'
    and jobnumber in (
        select jobnumber from Job
        where company = 'A'
        and branch = 'Branch123'
        and engineerNumber = '000123'
        and ID > 60473)
)
Louise
  • 107
  • 1
  • 12
  • I have also put an index on the table as Remus suggested. – Louise Sep 26 '12 at 15:35
  • 1
    You should be aware that if `Jobnumber` is nullable `not in` may have worse performance than `not exists` and have surprising results (the presence of a null in the sub query means no rows will be returned from the outer query) – Martin Smith Sep 26 '12 at 16:06