9

(SQL Server 2012 being used)

I found some topics on query optimization, and comparing EXISTS to COUNT, but I couldn't find this exact problem.

I have a query that looks something like this:

select * from
tblAccount as acc
join tblUser as user on acc.AccountId = user.AccountId
join tblAddress as addr on acc.AccountId = addr.AccountId
... **a few more joins**
where acc.AccountId in (
    select * accountid from
    (select accountid, count(*) from tblUser
    where flag = 1
    group by accountId) as tbl where c != 1

This query runs in an instant (although the db is quite big, around 70Gb).

When I wrap the query in an EXISTS as in:

if exists
(
  **Exact same query as above**
)
begin
RAISERROR('Account found without exactly one flagged user.', 16, 1);
end
else
begin
  print 'test passed.'
end

Suddenly the query takes about 5-6 seconds to complete. I've tried specifying IF EXISTS (SELECT TOP 1 FROM... and also tried NOT EXISTS (which was even slower). But neither work to speed this up.

If the normal select query completes basically instantly, then does anyone know why wrapping it in the EXISTS causes so much extra computation? And/or anyone have any ideas to work around this (I'm just trying to throw an error if any records are found at all by the original query).

Thanks!

John Darvill
  • 1,274
  • 11
  • 17
  • Is it exact same query wrapped in `EXISTS` ? Selecting top 1 won't do anything, because exists scans table and if there's record found, returns true and doesn't bother about other calculations afaik. But first think I'd start with would be execution plans. – Evaldas Buinauskas Jun 24 '15 at 16:46
  • Yeah exact same query. Have edited the original question to make this more clear. Not sure what you mean by execution plans? – John Darvill Jun 24 '15 at 16:53
  • maybe the normal query is cached and not actually querying the data every time as opposed to the exist() – JamieD77 Jun 24 '15 at 16:55
  • Execution plans show what exactly has been done during query execution. You can enable it as seen in link and then screenshot and upload both plans here, so we can see what happened. http://i.imgur.com/oiLRdLb.png – Evaldas Buinauskas Jun 24 '15 at 16:56
  • @user1221684 - SQL Server doesn't cache actual query results. Just compiled execution plans in the plan cache and the underlying data/index pages in the buffer pool. – Martin Smith Jun 24 '15 at 17:10

6 Answers6

12

Did you try running the original query with TOP 1? most likely it will be just as slow.

Sometimes when the optimizer thinks that something is very likely and going to return a vast set of data with little effort (i.e. almost all records are going to get returned), it chooses mostly loop joins because it only needs to get the first one and a loop join is good for only getting a couple records. When that turns out to not be true, it takes forever and a day to get results.

In your case, it sounds like it's very rare, so this choice hurts badly. Try instead doing something like SELECT @count = COUNT(*) FROM ... and then checking if that count is non-zero.

Tim Tom
  • 779
  • 3
  • 6
  • Yep agree. Imagine this will be a [row goal](http://sqlblog.com/blogs/paul_white/archive/2010/08/18/inside-the-optimiser-row-goals-in-depth.aspx) issue. – Martin Smith Jun 24 '15 at 16:55
  • Thanks for this answer! Can't say I completely understand, but I think I get the gist. And the workaround worked perfectly. – John Darvill Jun 26 '15 at 10:59
  • I stumbled across this after I asked my question: http://stackoverflow.com/questions/31032353/sql-server-if-exists-massively-slowing-down-a-query Is this the same reason for my the performance loss from 70ms to 4+ minutes? – DanCaveman Jan 11 '16 at 22:19
  • Given the behavior you're seeing, yes I would say it's the same thing. I'll try to explain again on your question. – Tim Tom Jan 12 '16 at 04:08
3

I've fought this issue as well.

The query was 10ms when I ran it on its own but once I put it in the If Exists it went to 4 minutes. No matter what I tried it didn't go back to 10ms. The issue was re-produced on 4 different servers but not on 2 servers. Servers all had the same db backup and same patch level of mssql 2012. servers were on different OS and varying hardware settings.

I tried

  • adjusting the max memory grant - no affect
  • changing the threshold for parallelism - no affect
  • rewrite the query to make it simpler - no affect
  • use top 1 - no affect
  • cleared the cache between changes - no affect
  • break the query into some indexed views where I could (can't do it to parts using outer join) - no affect
  • applied recommended missing index - reduced time from 4 to 3 minutes but still not the 10 ms I expected.
  • change the outer join to a where not in (sub-query) - no affect
  • run sp_updateStats - no affect

The only solution that worked for me was to put the results in a temp table and do the if exists against that temp table.

SELECT top 1 1 AS junk INTO #me FROM yourCraxyQueryHere IF EXISTS ( SELECT 1 FROM #me ) SELECT GETDATE()

hope this helps

Jack B
  • 31
  • 2
  • ***how-to samples*** for `adjusting the max memory grant`, `changing the threshold for parallelism` ,`cleared the cache between changes` ,`break the query into some indexed views` ,`applied recommended missing index` and `run sp_updateStats` ? – Kiquenet Aug 04 '17 at 08:07
  • I had a similar case where the query went from ~5 seconds to 12 minutes when wrapped in the IF EXISTS. I selected into a temp table and then did the IF EXISTS on the temp table and it's back to 5 seconds. – Robert Sievers Apr 01 '19 at 21:26
0

Try:

if exists
(
  select 1 from... etc
)
Matt
  • 74,352
  • 26
  • 153
  • 180
Cee
  • 11
  • 1
  • 3
    Since the `IF EXISTS` really only check for **existance** and doesn't actually return any data, this is pointless and really identical to the `IF EXISTS (SELECT * ....)` – marc_s Jun 24 '15 at 16:50
  • “The * will be expanded to some potentially big column list and then it will be determined that the semantics of the EXISTS does not require any of those columns, so basically all of them can be removed. “SELECT 1″ will avoid having to examine any unneeded metadata for that table during query compilation.” – Cee Jun 24 '15 at 16:55
  • @marc_s - It will happen that it inspects column metadata but it happens with `1` as well. – Martin Smith Jun 24 '15 at 16:56
  • 2
    @MartinSmith: OK - but the point is `IF EXISTS (SELECT * ...)` and `IF EXISTS (SELECT 1 ....)` are **identical** and at least in SQL Server produce the **exact same execution plan** - so nothing is gained by using `SELECT 1` instead of `SELECT *` - correct? – marc_s Jun 24 '15 at 16:58
  • 1
    @marc_s - Yes. The `1` vs `*` and effects of expanding column metadata is looked at in my answer here http://stackoverflow.com/a/6140367/73226. `SELECT 1 WHERE EXISTS (SELECT 1 FROM T); ` can still fail unexpectedly due to lack of column permissions on `T` showing that permissions are checked. And the slower compilation relative to number of columns occurs with both `1` and `*`. – Martin Smith Jun 24 '15 at 16:59
  • @marc_s One advantage that makes it slightly less pointless when using `select 1 from...` instead of `select * from...` is that it signals intent, in that we don't plan to use any of those columns. Yes, intent is semi-obvious because it's encased in an `exists` check, but I think `select 1` still adds to readability. – Geoff Aug 28 '23 at 22:31
0

What it works for me was to set the result of the query into a variable, then compare the variable, dont ask me why, it just woked for me. Cannot explain it

  • It would be helpful to edit your answer to include the SQL script described. You can redact or rename sensitive meta data values as needed. – JohnH Apr 26 '21 at 22:55
-1

You've got, what, three nested subqueries? Subqueries are always slow. Can you convert at least one of them to a join? As in:

select acc.AccountId from tblAccount as acc
    join tblUser as user on acc.AccountId = user.AccountId
    join tblAddress as addr on acc.AccountId = addr.AccountId
    join (select accountid, count(*) as c from tblUser
          where flag = 1
          group by accountId) as tbl ON tbl.accountid = user.accountid
    where tbl.c != 1
Mike K
  • 486
  • 3
  • 7
-2

Try SELECT 1 instead of top 1 *. You don't actually need to return data, you're just checking for the existence of a record.

You could also try clearing the query cache if it's a stored procedure, like this:

To get the plan handle:

SELECT qs.plan_handle
FROM sys.procedures AS p WITH (NOLOCK)
INNER JOIN sys.dm_exec_procedure_stats AS qs WITH (NOLOCK)
ON p.[object_id] = qs.[object_id]
WHERE qs.database_id = DB_ID()
and p.name like '%SprocName%'
ORDER BY qs.execution_count DESC OPTION (RECOMPILE);

Then clear the cache by putting the handle into this call:

DBCC FREEPROCCACHE (0x05000F00C616D37C40E15E64010000000000000000000000);
JonathanS
  • 75
  • 2
  • 7