Adding Conditional around query increases time by over 2400%

Question

Update: I will get query plan as soon as I can.

We had a poor performing query that took 4 minutes for a particular organization. After the usual recompiling the stored proc and updating statistics didn't help, we re-wrote the if Exists(...) to a select count(*)... and the stored procedure when from 4 minutes to 70 milliseconds. What is the problem with the conditional that makes a 70 ms query take 4 minutes? See the examples

These all take 4+ minutes:

if (
  SELECT COUNT(*)       
    FROM ObservationOrganism  omo
    JOIN Observation          om  ON  om.ObservationID  = omo.ObservationMicID
    JOIN Organism             o   ON  o.OrganismID      = omo.OrganismID
    JOIN ObservationMicDrug   omd ON  omd.ObservationOrganismID = omo.ObservationOrganismID
    JOIN SIRN                 srn ON  srn.SIRNID        = omd.SIRNID
    JOIN OrganismDrug         od  ON  od.OrganismDrugID = omd.OrganismDrugID
  WHERE
    om.StatusCode IN ('F', 'C')
    AND o.OrganismGroupID <> -1
    AND od.OrganismDrugGroupID <> -1
    AND (om.LabType <> 'screen' OR om.LabType IS NULL)) > 0

print 'records';

-

IF (EXISTS(
  SELECT *       
    FROM ObservationOrganism  omo
    JOIN Observation          om  ON  om.ObservationID  = omo.ObservationMicID
    JOIN Organism             o   ON  o.OrganismID      = omo.OrganismID
    JOIN ObservationMicDrug   omd ON  omd.ObservationOrganismID = omo.ObservationOrganismID
    JOIN SIRN                 srn ON  srn.SIRNID        = omd.SIRNID
    JOIN OrganismDrug         od  ON  od.OrganismDrugID = omd.OrganismDrugID
  WHERE
    om.StatusCode IN ('F', 'C')
    AND o.OrganismGroupID <> -1
    AND od.OrganismDrugGroupID <> -1
    AND (om.LabType <> 'screen' OR om.LabType IS NULL))

print 'records'

This all take 70 milliseconds:

Declare @recordCount INT;
SELECT @recordCount = COUNT(*)       
    FROM ObservationOrganism  omo
    JOIN Observation          om  ON  om.ObservationID  = omo.ObservationMicID
    JOIN Organism             o   ON  o.OrganismID      = omo.OrganismID
    JOIN ObservationMicDrug   omd ON  omd.ObservationOrganismID = omo.ObservationOrganismID
    JOIN SIRN                 srn ON  srn.SIRNID        = omd.SIRNID
    JOIN OrganismDrug         od  ON  od.OrganismDrugID = omd.OrganismDrugID
  WHERE
    om.StatusCode IN ('F', 'C')
    AND o.OrganismGroupID <> -1
    AND od.OrganismDrugGroupID <> -1
    AND (om.LabType <> 'screen' OR om.LabType IS NULL);

IF(@recordCount > 0)
  print 'records';

It doesn't make sense to me why moving the exact same Count(*) query into an if statement causes such degradation or why 'Exists' is slower than Count. I even tried the exists() in a select CASE WHEN Exists() and it is still 4+ minutes.

typo in last query? Unclosed parentheses in ` AND (om.LabType <> 'screen' OR om.LabType IS NULL;` — Alex Yu, Jan 07 '16 at 19:01
@DanKaufman: Dan, please send plans. Your question is very intriguing. — Alex Yu, Jan 07 '16 at 19:35
@BrianWhite I have looked at the query plan for the long running one. It is using indexes for all of the lookups. I have not compared it to the fast query plan. Will do shortly. — DanCaveman, Jan 07 '16 at 19:36
@MartinSmith - can you elaborate. I was not familiar with that term, so I looked it up. I am not sure how row goals cause the Count(*) query operate slower in a conditional. As I understand, a query can be optimized to find the first record quicker at the expense of the entire query. Finding all records (as in count(*)) should never be faster than exists (which only needs to find one record) regardless of how the query is optimized. — DanCaveman, Jan 11 '16 at 22:14
I'm assuming that `if(select count())>0` is being optimised to `exists` — Martin Smith, Jan 11 '16 at 23:40
@MartinSmith - I understand your assumption. I just found this SO article: http://stackoverflow.com/questions/31032353/sql-server-if-exists-massively-slowing-down-a-query, Tim Tom attempts to explain, so it makes a bit more sense; however, I am still a little unsure of why counting every record is takes less time than an Exists, but I will have to read more into it. — DanCaveman, Jan 12 '16 at 03:23
http://blogs.msdn.com/b/bartd/archive/2012/03/14/row-goals-gone-rogue.aspx — Martin Smith, Jan 12 '16 at 06:39

score 4 · Accepted Answer · edited May 23 '17 at 12:15

Given that my previous answer was mentioned, I'll try to explain again because these things are pretty tricky. So yes, I think you're seeing the same problem as the other question. Namely a row goal issue.

So to try and explain what's causing this I'll start with the three types of joins that are at the disposal of the engine (and pretty much categorically): Loop Joins, Merge Joins, Hash Joins. Loop joins are what they sound like, a nested loop over both sets of data. Merge Joins take two sorted lists and move through them in lock-step. And Hash joins throw everything in the smaller set into a filing cabinet and then look for items in the larger set once the filing cabinet has been filled.

So performance wise, loop joins require pretty much no set up and if you're only looking for a small amount of data they're really optimal. Merge are the best of the best as far as join performance for any data size, but require data to be already sorted (which is rare). Hash Joins require a fair amount of setup but allow large data sets to be joined quickly.

Now we get to your query and the difference between COUNT(*) and EXISTS/TOP 1. So the behavior you're seeing is that the optimizer thinks that rows of this query are really likely (you can confirm this by planning the query without grouping and seeing how many records it thinks it will get in the last step). In particular it probably thinks that for some table in that query, every record in that table will appear in the output.

"Eureka!" it says, "if every row in this table ends up in the output, to find if one exists I can do the really cheap start-up loop join throughout because even though it's slow for large data sets, I only need one row." But then it doesn't find that row. And doesn't find it again. And now it's iterating through a vast set of data using the least efficient means at its disposal for weeding through large sets of data.

By comparison, if you ask for the full count of data, it has to find every record by definition. It sees a vast set of data and picks the choices that are best for iterating through that entire set of data instead of just a tiny sliver of it.

If, on the other hand, it really was correct and the records were very well correlated it would have found your record with the smallest possible amount of server resources and maximized its overall throughput.

thank you for the explanation. Reading this and your other posts, the way to correctly fix this is the create some missing indexes or re-order joins, etc... Is that correct? — DanCaveman, Jan 12 '16 at 16:19
There's really no guarantee of ways to get the server to do what you want. Indexes may be overkill unless you have a lot of other queries that want those fields. You can try changing all of your JOIN's to the more explicit INNER HASH JOIN (or the minimal subset that gives you the behavior you want) and the engine will give higher preference to hash joins, but even then it can still override your suggestions if it thinks there's a better way. I suggest just declaring the variable, moving the count(*) out of the condition and moving on personally as fighting the engine is an uphill battle. — Tim Tom, Jan 13 '16 at 21:33
Thanks for the suggestions. I have changed it to a select Count(*), done some meditation, and continue to strive for inner peace every time I read articles about why you "never" use Count(*) to check for existence. I think I have finally moved on :) — DanCaveman, Jan 19 '16 at 17:13

Adding Conditional around query increases time by over 2400%

1 Answers1