0

Someone I've been working with wrote what I at first thought was a very redundant query. I was considering removing the redundancies but then it occurred to me that it may actually be less efficient if I did that.

Normally, for optional filter arguments I would write something like this:

SELECT stuff
FROM data
WHERE (@arg1 IS NULL OR Field1 = @arg1)
AND (@arg2 IS NULL OR Field2 = @arg2)
...

The way he wrote his query was more like this:

IF @arg1 IS NOT NULL AND @arg2 IS NOT NULL
BEGIN
   SELECT stuff
   FROM data
   WHERE Field1 = @arg1
   AND Field2 = @arg2
END

IF @arg1 IS NULL AND @arg2 IS NOT NULL
BEGIN
   SELECT stuff
   FROM data
   WHERE Field2 = @arg2  
END

IF @arg1 IS NOT NULL AND @arg2 IS NULL
BEGIN
   SELECT stuff
   FROM data
   WHERE Field1 = @arg1
END

On the one hand, the redundancy seems terrible (especially when the real thing had 12 blocks, 5 arguments, 8 fields, and 2 tables with a join). But on the other hand... this does pull a lot of comparisons out of the iteration. Only exactly the comparisons that need to be made are made in the filter, and repetitive unnecessary comparison against NULL is removed to a single place outside of the actual query. (Basically 'constant' or 'fixed' information is pulled out of the query 'loop'.)

So... is the second version here actually "better" from a performance point of view? I know, premature optimization and all that, and readability is more important, and DRY, etc., but still I wonder, is it more efficient performance-wise? (What I would hope is that these are basically equivalent, and that the DB engine should already be doing something like this already.)

Dave Cousineau
  • 12,154
  • 8
  • 64
  • 80
  • 3
    The only way to know for sure if it performs better is to run a set of tests and compare the execution plans. However, there is a very high chance that only using the conditions of interest will perform better, because SQL Server frequently performs badly with `OR` conditions, especially a lot of `OR` conditions. But I repeat the only way to know for sure is test it. This problem is frequently solved using dynamic SQL to avoid repeating the code as you mention. That does introduce the issues that dynamic SQL brings, but they are often considered the lessor of two evils in this situation. – Dale K Dec 09 '21 at 22:18
  • 1
    Yes, your "redundant" query probably started life as you might expect to see it and ended up how it is in order to improve performance. When it's one or two @variables that are part of the *or* criteria I have done exactly this, when it's more then actually dynamically building a query is more managable, imho. You can do a contrived example to test but I would bet with the *or "@variable" is null* criteria will yield an index or table *scan* and if you remove the *or condition* you'd get an optimal index seek, assuming of course a suitable index existed. – Stu Dec 09 '21 at 22:32
  • This will have exactly the same issue. – Stu Dec 09 '21 at 23:44
  • Generally SQL Server will perform constant evaluation before query execution i.e. all cases of `@argX IS NULL` will be evaluated first before the query is executed. The problem can occur where on first execution the query plan is created. This plan can end up being a good match for one scenario but be a bad plan for others. – Alex Dec 10 '21 at 00:15
  • @Alex that is what I would expect. `@arg1` should be seen to be non-variable in the query, which extends to `@arg1 IS NULL`, which makes the expression `TRUE OR `, which makes the expression `TRUE` which effectively removes the condition. but I don't know if it does or how to tell if it does. – Dave Cousineau Dec 10 '21 at 00:43
  • If you run you stored procedure/queries from SSMS, you can look at the [query plan used](https://stackoverflow.com/questions/7359702/how-do-i-obtain-a-query-execution-plan-in-sql-server). You can then paste the XML [here](https://www.brentozar.com/pastetheplan/) and add the link to this question for us to have a look. – Alex Dec 10 '21 at 00:58
  • 2
    You have made an assumption and fallen into the trap of equating "a very redundant query" with sub-optimal or inefficient or "needs fixing". So stop doing that. Never fix things that are working no matter how much the code violates your sensibilities. I suggest you read Erland's discussion of [dynamic search conditions](https://www.sommarskog.se/dyn-search.html) and Aaron's [kitchen sink discussion](https://www.sentryone.com/blog/aaronbertrand/backtobasics-updated-kitchen-sink-example) – SMor Dec 10 '21 at 00:58
  • @SMor - he is actually asking the question of what is better or worse, so I do not see the point of your accusations/conclusions ("You have made an assumption and fallen into the trap", 'Never fix things', etc.). The links you have provided are very much useful though. – Alex Dec 10 '21 at 01:13
  • @SMor I get what you're saying but actually this query was not actually working and did need fixing besides this issue, and the problems with the query were directly related to the redundancy (not every block had been updated the right way, since each of 12 different blocks of nearly identical code all had very subtle differences in the comparisons going on at the tops and bottoms, and 3 of them were wrong (had been left the same as other blocks but should have changed) and would cause no results on certain inputs). – Dave Cousineau Dec 10 '21 at 05:46
  • @SMor that "dynamic search conditions" link seems to say that you *can* use AND/OR similar to my first example (though he uses `...AND @arg1 IS NOT NULL`) and it will be logically reduced to something much more basic, not repeatedly executing the same comparisons. – Dave Cousineau Dec 10 '21 at 17:05
  • though, according to my tests, the second option seems significantly better. the OR approach results in two index scans. the many IFs approach results in key lookups and index seeks. the ORs has an estimated cost of 0.1075 whereas the IFs has 0.0199. – Dave Cousineau Dec 10 '21 at 18:09

0 Answers0