0

In a question from over 4 years ago, the poster asked if and how the following code could be simplified:

CREATE Procedure getAllEmployeesByDeptAndFlag
@Dept int,
@sal int,
@Flag int
AS
if @flag = 1
    select Department, Salary
    from employee 
    where Department = @dept and Salary > @sal
else 
    select Department, Salary
    from employee 
    where Department = @dept

The answers to this and similar questions always seem to recommend to short circuit a boolean expression using the logical OR operator. Although, notably, most authors don't actually say it in so many words but just leave you with an example that is just supposed to work. Like the accepted answer in this case which consists of just one line:

select * from employee where Department = @dept AND (@flag != 1 OR Salary > @sal)

Having worked with SQL for quite some time, I was confused when I first saw this. If I'm thinking of a where clause that looks something like the typical example WHERE Country = 'USA' OR Country='Canada', we know that the query won't just stop with the evaluation of the full clause after it finds the first condition to be true, it will always also evaluate the second condition and present the results accordingly.

In the answer mentioned above, however, the second condition is not evaluated if the first is found to be true. While I have found out that this behavior is called short-circuiting, I still find it difficult to understand why it works the way it does. To be clear, I understand how this would work in a procedural language such as C but I don't quite understand why it works in SQL and any more detailed insights to this would be appreciated.

Upon researching the subject further, I came across several articles, such as here and here that further explain that this so called short-circuiting will not always work in SQL. Some answers here on SO seem to go in the same direction. This seems to have to do with the fact that the query optimizer is free to choose whichever path it finds best and this may mean that it will evaluate the conditions in a different order.

Would you consider the accepted answer as the correct answer today? If yes, can you elaborate why you don't see short circuiting as an issue in this specific example? What other, maybe better ways exist to get the same result?

UPDATE: As Gordon noticed, the original procedure had an issue. The second condition read @sal < 10000. So, a variable was compared to a constant, there was no relationship between either with the queried table. I'm sure the original poster meant to compare the constant to a value in the data table, so I rewrote both the procedure as well as the given answer to fit this.

vic
  • 292
  • 3
  • 14
  • This is not a valid `where` clause: `WHERE Country = 'USA' OR 'Canada'`. You seem quite confused. Why is the second comparison on `@sal` -- that is either true or false for all rows and should be part of the `if` condition. – Gordon Linoff Jun 24 '20 at 12:04
  • @GordonLinoff Thanks, I corrected the where clause. I'm not sure I understand your other question, though. – vic Jun 24 '20 at 12:14
  • The optimizer is not guaranteed to short-circuit, nor is it guaranteed to *not* short-circuit. All (sub)-expressions in T-SQL should be valid in all circumstances (that is, regardless of what row they're applied to), otherwise you can get errors (this commonly manifests with an unexpected division by zero). Whether or not splitting a `WHERE` clause is necessary or beneficial for performance depends on the execution plan the optimizer ends up choosing. In your case, as per Gordon's answer, we're only comparing variable values, not row contents, so the split should not be necessary. – Jeroen Mostert Jun 24 '20 at 12:16
  • Ok, I just see that I was too quick to just copy the example over from the other question... the original query seems to have some issues and I think that's what Gordon pointed out. I'll change it to reflect what it should actually look like. My apologies for that. – vic Jun 24 '20 at 12:23
  • 1
    A condition like `where Department = @dept AND (@flag != 1 OR Salary > @sal)` simplifies the query but does not necessarily help with performance, since the optimizer must (generally) generate a plan that works regardless of the value of `@flag` and hence cannot leave out the evaluation of the `Salary` comparison, which may impact index selection even if actual evaluation is cut during execution. A hint like `OPTION (RECOMPILE)` would help there. (Or, of course, splitting up the query, as done in the first snippet.) – Jeroen Mostert Jun 24 '20 at 12:44
  • 1
    Erland Sommarskog has a [detailed article on dynamic search conditions](http://sommarskog.se/dyn-search.html), where this sort of thing commonly pops up, as well as all the pros and cons of how to handle it. To a first approximation, it's simplest to assume the optimizer will not short-circuit anything (for correctness), write queries "naturally", whatever that is to you, and worry about rewriting the queries only if performance isn't up to par. – Jeroen Mostert Jun 24 '20 at 12:48
  • @user3561813 Frankly, I still have a knot in my brain. I believe I understand what you're saying. But coming back to the where clause example in my question, i.e. `WHERE country = 'USA' or country = 'Canada'`, and if I compare this to `WHERE @flag <> 1 or salary > @sal`, I really have issues to understand why in the first example the result set will show both USA and Canada, and in the second example the result set will not apply the filter `salary > @sal`. Banging my head here... – vic Jun 24 '20 at 13:31
  • @JeroenMostert, Thank you for your helpful comments and the link to the article, I'll be sure to read it. Could you please elaborate what you mean when you say "splitting a where clause"? – vic Jun 24 '20 at 13:46
  • As per your first snippet, taking a compound `WHERE` clause and writing two queries with an `IF` -- in other words, doing the short-circuiting yourself with an explicit rewrite. While `OPTION (RECOMPILE)` is more convenient and less repetitive, splitting has the benefit of not needing recompilations, nor presenting you with any surprises on what the optimizer might or might not evaluate. – Jeroen Mostert Jun 24 '20 at 14:07

1 Answers1

1

This addresses the original version of the question.

In this logic:

where Department = @dept AND (@flag <> 1 OR @sal < 10000)

Both conditions are (conceptually) evaluated for all rows in the table. I say conceptually, because a SQL query describes the result set; it does not specify the individual steps. So, rows might be skipped using indexes or partitions, for instance.

That said, the second condition consists only of constant values. A smart (and even not so smart) optimizer will recognize this and evaluate it during the compilation phase. So the query that actually gets executed would be one of:

where Department = @dept and <true>
where Department = @dept and <false>

If the latter, a further step in the optimization phase can say: "Hey, this is always false, so I don't have to do anything."

In all cases, though, the result set is based on what the query describes -- just constant portions may be evaluated up-front rather than row-per-row.

Gordon Linoff
  • 1,242,037
  • 58
  • 646
  • 786
  • You seem to have dropped a `@` off of `sal` whilst at the same time the OP has edited their question to make it a column rather than a variable. I was struggling to understand your "the second condition consists only of constant values" until I saw both these things. – Damien_The_Unbeliever Jun 24 '20 at 12:30
  • @GordonLinoff Gordon, it would be great if you could update your answer to reflect the updated question. Same as Damien, I was thrown off by your statement of "only constant values" in the second condition and realized only then that the procedure I copied over from another question had issues. – vic Jun 24 '20 at 12:45