0

I have the following query on the AdventureWorks2012 database

SELECT  productid,
        productname,
        unitprice,
        CASE
            WHEN unitprice < 20.0 THEN 'LOW'
            WHEN unitprice < 40.0 THEN 'MEDIUM'
            WHEN unitprice >= 40.0 THEN 'HIGH'
        END pricerange
FROM    Production.Products
ORDER BY
    CASE
        WHEN pricerange < 'LOW' THEN 1
        WHEN pricerange < 'MEDIUM' THEN 2
        WHEN pricerange >= 'HIGH' THEN 3
    END ASC
GO

The ORDER BY happens after the SELECT statement, but the pricerange column name cannot be accessed? I assume that pricerange in the SELECT statement is calculated after ORDER BY is called? Why is this?

NomenNescio
  • 2,899
  • 8
  • 44
  • 82
  • 1
    Are you sure this actually works? I'm pretty sure it didn't use to, though some version of SQL Server might have allowed this. What version of SQL Server are you using? In any case what and when is calculated isn't clear cut in SQL - it's a declarative language, not an imperative one, so the order of execution isn't fixed. The execution engine figures out the best way to execute the query (you can include the execution plan to check it out). – Luaan Aug 07 '15 at 14:48
  • @Luaan it does not work, that's why I asked it :) – NomenNescio Aug 07 '15 at 16:03
  • Could it be that the `pricerange` column name is ambiguous? Perhaps there's already a `pricerange` column in the `Products` table? – Luaan Aug 07 '15 at 17:26
  • You know that your ordering doesn't have any sense, right? Because even if it worked, you are comparing strings... `pricerange < 'LOW'` will be true only when `pricerange` is `'HIGH'`, because `'HIGH' < 'LOW'`... – xanatos Aug 10 '15 at 11:47
  • @xantos Yeah I know, I just wanted to know why it didn't work. – NomenNescio Aug 10 '15 at 11:48

2 Answers2

3

SQL is a declarative language, not an imperative one. The order of execution isn't defined, and it isn't always the same.

In any case, order of execution isn't really important. The major point is that of scope - and pricerange isn't in scope anywhere within that select statement. Rather than thinking about the order in which lines of code appear (as in imperative programming), you should think about how each expression wraps another expression.

In this case, you're doing something like this:

Select(OrderBy(From(Products), ...), ...)

You have two ways around this - one option is to use the same case in the order by as the one you use in the select (don't worry, the engine is smart enough not to do the work twice). The second is to wrap your query in another query that does the actual ordering:

select * from
(
  SELECT  productid,
          productname,
          unitprice,
          CASE
              WHEN unitprice < 20.0 THEN 'LOW'
              WHEN unitprice < 40.0 THEN 'MEDIUM'
              WHEN unitprice >= 40.0 THEN 'HIGH'
          END pricerange
  FROM Production.Products
)
ORDER BY
    CASE
       WHEN pricerange < 'LOW' THEN 1
       WHEN pricerange < 'MEDIUM' THEN 2
       WHEN pricerange >= 'HIGH' THEN 3
    END ASC

But keep in mind that you're dealing with expression trees here, not command lists. You're describing what you want, not how it's going to execute. The how is the execution engine's job.

In the end, the execution engine might make the same execution plan for both variants of the query - they aren't really different; although there might be some corner cases around NULLs for example, I'm not sure.

Luaan
  • 62,244
  • 7
  • 97
  • 116
  • What is your reasoning for concluding that I'm doing `Select(OrderBy(From(Products), ...), ...)` instead of `OrderBy(Select(From(Products), ...), ...)` ? – NomenNescio Aug 07 '15 at 16:21
  • @ProgrammerAtWork Because that's what your code says. SQL is not imperative - it is not evaluated in order of writing (although some engines come closer than others - Oracle for example needs ordered joins). In a single select expression, the `select` clause itself is evaluated almost last (it's just a mapping of the result values, after all). It's interesting that the documentation says that `order by` should have the results of the select - in my experience, that has *never* been the case. Perhaps using a column index instead of column name would work, but that's very fragile. – Luaan Aug 07 '15 at 17:21
  • @ProgrammerAtWork Aha! It is indeed a new feature. That's why the documentation doesn't seem to fit. If you want this to work, set your database to compatibility level for MS SQL 2012, and use a 2012 SQL Server. Otherwise, you need to make sure order by *doesn't* reference the select-list. – Luaan Aug 07 '15 at 17:29
  • Unfortunately, I don't know how to refer to the column index inside a `CASE` clause (quick search doesn't find anything), But what makes you think it's a new feature? After setting the compatibility level to 2012 from 2014 it still doesn't compile. – NomenNescio Aug 07 '15 at 17:48
  • @ProgrammerAtWork You can't, that doesn't work with `case`, just plain `order by`. And the select-list order by isn't mentioned in the older documentations, so it's a good bet it's relatively new. If it doesn't work anyway, it's possible it doesn't work in a `case` expression, though that would be even more odd. – Luaan Aug 07 '15 at 17:59
  • Well, I can refer to `pricerange`, this works: `ORDER BY pricerange`. So maybe it's either some undocumented functionality or, however unlikely, a bug in tsql. – NomenNescio Aug 07 '15 at 18:03
  • found a duplicate: http://stackoverflow.com/questions/25763920/why-cant-i-refer-to-a-column-alias-in-the-order-by-using-case – NomenNescio Aug 07 '15 at 18:22
3

SQL Server explains the logical processing order for queries in the documentation.

  1. FROM
  2. ON
  3. JOIN
  4. WHERE
  5. GROUP BY
  6. WITH CUBE or WITH ROLLUP
  7. HAVING
  8. SELECT
  9. DISTINCT
  10. ORDER BY
  11. TOP

Note that this is logical. In practice, this means that the ordering is used during the compilation phase of the query mostly to determine what name references refer to. Also note that the list is a strange mixture of keywords and clauses (ON, WITH, DISTINCT, and TOP are not SQL clauses).

As for your query, the definition of pricerange is determined by the logic in the SELECT. This is then further used in the ORDER BY to get the ordering you want. This is a logical description of the processing.

In practice, I would expect SQL Server to calculate both the pricerange and the ordering priority when it scans the data (SQL Server optimizes such calculations by doing them when the data is read). The pricerange calculation goes into the final result set. The ordering piece is just used by the ORDER BY.

Gordon Linoff
  • 1,242,037
  • 58
  • 646
  • 786
  • I see, the line `Note that the actual physical execution of the statement is determined by the query processor and the order may vary from this list.` in the documentation says as much. However, the documentation also clearly says that I can refer to aliases of `SELECT` from `ORDER BY`. What thought process did you use to identify that this was not the case here? How am I to know beforehand that my query would fail? Intricate knowledge of how the query processor works, or is there another way? – NomenNescio Aug 07 '15 at 16:35
  • @ProgrammerAtWork It's weird. Perhaps the documentation is simply wrong, it happens once in a while. Or it only allows you to refer to the columns by their index, or it requires you to use some special alias, I'm not sure - but I've never seen `order by` use a column from the `select` mapping in the same statement. But even the ORDER BY clause documentation explicitly says `The column names referenced in the ORDER BY clause must correspond to either a column in the select list or to a column defined in a table specified in the FROM clause without any ambiguities.`. Weird. – Luaan Aug 07 '15 at 17:22