Cost of using OPTION (RECOMPILE) vs Joining to tables

Question

Using: SQL Server 2016+

I've been looking to see if there is any way to evaluate how long it takes SQL Server to recompile the execution plan for a query. We have several stored procedures where we select from table 1 based on a nullable parameter in table 2. E.g. If customer is not null return their sales else return all sales.

Sample data:

DROP TABLE IF EXISTS dbo.TestTable1;
DROP TABLE IF EXISTS dbo.TestTable2;

CREATE TABLE dbo.TestTable1 (ID INT NOT NULL PRIMARY KEY CLUSTERED , TextValue NVARCHAR(255) NULL);
CREATE TABLE dbo.TestTable2 (ID INT NOT NULL PRIMARY KEY CLUSTERED , TextValue NVARCHAR(255) NULL);

INSERT INTO TestTable1 (ID, TextValue)
VALUES (1, N'Table 1 - Text 1'),
       (2, N'Table 1 - Text 2'),
       (3, N'Table 1 - Text 3'),
       (4, N'Table 1 - Text 4'),
       (5, N'Table 1 - Text 5'),
       (6, N'Table 1 - Text 6'),
       (7, N'Table 1 - Text 7'),
       (8, N'Table 1 - Text 8'),
       (9, N'Table 1 - Text 9'),
       (10, N'Table 1 - Text 10');

INSERT INTO TestTable2 (ID, TextValue)
VALUES (1, N'Table 2 - Text 1'),
       (2, N'Table 2 - Text 2'),
       (3, N'Table 2 - Text 3'),
       (4, N'Table 2 - Text 4'),
       (5, N'Table 2 - Text 5'),
       (6, N'Table 2 - Text 6'),
       (7, N'Table 2 - Text 7'),
       (8, N'Table 2 - Text 8'),
       (9, N'Table 2 - Text 9'),
       (10, N'Table 2 - Text 10');

This is greatly over simplified as we would have multiple possible conditions, linking to multiple tables. We are currently looking at recompiling the query so that the joins to the secondary table are only made when required.

DECLARE @LookupValue NVARCHAR(50);

SET @LookupValue = NULL;

SELECT  *
  FROM  dbo.TestTable1 T1
 WHERE  @LookupValue IS NULL
    OR  EXISTS ( SELECT TOP (1) 1 A FROM dbo.TestTable2 T2 WHERE T1.ID = T2.ID AND T2.TextValue = @LookupValue)
OPTION (RECOMPILE)

SET @LookupValue = N'Table 2 - Text 1';

SELECT  *
  FROM  dbo.TestTable1 T1
 WHERE  @LookupValue IS NULL
    OR  EXISTS ( SELECT TOP (1) 1 A FROM dbo.TestTable2 T2 WHERE T1.ID = T2.ID AND T2.TextValue = @LookupValue)
OPTION (RECOMPILE);

As you can see from the query plan below with the recompile table 2 is effectively removed from the execution.

There is however a cost to recompiling, and I'm looking to trace this overhead so I can make sensible decisions as to which of our queries to format in this way. In general I am seeing that recompiling is consistently faster but there are lots of post stating that this means the execution plans can come out as being far less than optimal.

Any guidance on measuring these overheads or any issues I should investigate before implementing this more widely would be gratefully received.

Many thanks.

Why use such catch-all queries at all? Recompilation is an attempt to fix the problems generated by catch-all queries. ORMs and LINQ on the client side can ensure that only the required parameters are used when constructing a query. If only one parameter is used, it's actually easier to call different queries or stored procedures depending on whether the parameter is wanted or not. Finally, `TOP` isn't needed. `EXISTS` matches if any result is available, it doesn't need `TOP` to limit the results. An equivalent would be `where T1.ID in (select ID from T2 where T2.Value=@id)` — Panagiotis Kanavos, Dec 12 '18 at 10:41
Hi Panagiotis. The example here is only to illustrate. We may well be linking out to 7 or 8 different tables dynamically. So 8 parameters would mean 256 separate code paths or stored procedures. I agree if there was only one parameter I would likely write two queries. As to the difference between using EXISTS verse IN - I'll double check that, but if there are a lot of values in the second table, I believe IN can be very inefficient. My apologies I can see how my simplified data set would lead you to that conclusion. — Matthew Baker, Dec 12 '18 at 10:49
I should mention as well, we are using this to come away from some very messy Dynamic SQL. Dynamic SQL would be another approach but is something we are trying to avoid where possible. — Matthew Baker, Dec 12 '18 at 10:51
Catch-all queries are a mess too. An ORM or Dapper query that contains only the required parameters will be cleaner and run as fast without the recompilation cost — Panagiotis Kanavos, Dec 12 '18 at 10:59
Hi Panagiotis. I've just been checking on the IN vs EXISTS statements. You are correct, the TOP command isn't required. EXISTS stops executing as soon as it finds a result. If there are multiple records in the second table however the IN statement will be much slower. Its well documented here: https://stackoverflow.com/questions/2065329/sql-server-in-vs-exists-performance . Many thanks for the options though. — Matthew Baker, Dec 12 '18 at 10:59
IN isn't inefficient. It's *TOP* that's inefficient by forcing an extra operation. In this case though it seems the query optimizer was smart enough to realize that `TOP` wasn't needed and simply eliminated it from the execution plan. I suspect `EXISTS` without TOP, `EXISTS` with TOP and `IN` will generate the same execution plan — Panagiotis Kanavos, Dec 12 '18 at 11:01
Erland discusses dynamic search conditions and dynamic sql [here](http://www.sommarskog.se/dyn-search.html). Dynamic sql is not "messy" per se - complicated perhaps. The messiness comes from the coder. — SMor, Dec 12 '18 at 12:34

score 0 · Answer 1 · edited Apr 28 '19 at 20:47

You can do what you are asking without the option recompile all that does it force it to generate a new plan every time. You want to do whatever you can to not put your branching logic inside of the where clause that just beats up the engine trying to figure out how to possibly create a suitable plan for both of your scenarios which it cannot do and will just create performance issues. What you want to do is create a branching logic such that there are two distinct queries each which can have their own execution plan. You can perform this with something along these lines.

Declare @customerid int
Set @customerid = (select customerid from dbo.table2)

If @customerid is null
BEGIN
Select datadesired from table1
END

ELSE
BEGIN
Select datadesired from table1
INNER JOIN table2 ON PKey = FKey
WHERE customerid = @customerID
END

This should work fantastic if you want to either pull all the data or a specific set of data. If you go more complex like you suggested you might be dynamic SQL is probably the better option. Can you still use this method to create a query for each scenario? Sure, if you want. But I can pretty much guarantee that if you try to use branching where clause logic to create two completely different queries and then give it any significant amount of data you will have problems, but making two distinct queries works as well as if both of those queries lived in different stored procedures and you figured out which procedure to call at the web layer (also possible).

jagra · Answer 2 · 2019-04-28T21:09:11.767

To get an insight on compile times, please take a look at:

https://ericblinn.com/quantifying-compile-time

Basically, use SET STATISTICS TIME ON before your query to get console messages on time spent in compilation and execution.

As a side note, not your question, worrying about compilation times might not be the most productive course of action. Pretty low-level under-the-hood stuff and unpredictable, IMHO.

If you have two very distinct query patterns, maybe it would be better creating two (or more) distinct stored procs, gated by an entry one based on conditions, each one with its own pattern (inclusion or removal of the nullable parameter) and letting the optimizer do its work in peace.

Forcing reparsing and rebuilding plans while being hit by continuous transactions might not be the most sensible choice.

Also, check the following blog:

https://blogs.msdn.microsoft.com/robinlester/2016/08/10/improving-query-performance-with-option-recompile-constant-folding-and-avoiding-parameter-sniffing-issues/

It has some insights on OPTION(OPTIMIZE FOR (@string = '')) that might be usefull.

But, as stated before, the conclusion I take from this is not using recompile but design data access procedures that avoid the necessity of it, when possible.

Cost of using OPTION (RECOMPILE) vs Joining to tables

2 Answers2