12

In some other databases (e.g. DB2, or Oracle with ROWNUM), I can omit the ORDER BY clause in a ranking function's OVER() clause. For instance:

ROW_NUMBER() OVER()

This is particularly useful when used with ordered derived tables, such as:

SELECT t.*, ROW_NUMBER() OVER()
FROM (
    SELECT ...
    ORDER BY
) t

How can this be emulated in SQL Server? I've found people using this trick, but that's wrong, as it will behave non-deterministically with respect to the order from the derived table:

-- This order here ---------------------vvvvvvvv
SELECT t.*, ROW_NUMBER() OVER(ORDER BY (SELECT 1))
FROM (
    SELECT TOP 100 PERCENT ...
    -- vvvvv ----redefines this order here
    ORDER BY
) t

A concrete example (as can be seen on SQLFiddle):

SELECT v, ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) RN
FROM (
  SELECT TOP 100 PERCENT 1 UNION ALL
  SELECT TOP 100 PERCENT 2 UNION ALL
  SELECT TOP 100 PERCENT 3 UNION ALL
  SELECT TOP 100 PERCENT 4
  -- This descending order is not maintained in the outer query
  ORDER BY 1 DESC
) t(v)

Also, I cannot reuse any expression from the derived table to reproduce the ORDER BY clause in my case, as the derived table might not be available as it may be provided by some external logic.

So how can I do it? Can I do it at all?

Community
  • 1
  • 1
Lukas Eder
  • 211,314
  • 129
  • 689
  • 1,509
  • how about `SELECT NULL`? will it still give invalid result? – John Woo Sep 23 '13 at 14:25
  • @491243: Yes. Well, the result is obviously "valid", but I'm wondering if the behaviour of an empty `OVER()` is really well-defined, or if this works by coincidence on DB2... I'll prepare a SQL Fiddle to illustrate this – Lukas Eder Sep 23 '13 at 14:26
  • You need to put the row_number on the inner query, so If you can't modify this I think you're out of luck. – Laurence Sep 23 '13 at 14:37
  • @Laurence: There are two problems. 1) I don't necessarily have access to the inner query, 2) the inner query might contain `DISTINCT`, in case of which adding `ROW_NUMBER()` would change the semantics of the inner query. – Lukas Eder Sep 23 '13 at 14:38
  • some ways it does work: http://sqlfiddle.com/#!6/167e3/24 seems like the `top 100%` part is an issue? (as in `top 10000000000` works) though if you cannot modify the inner query can you add a top constraint? –  Sep 23 '13 at 16:31
  • @gordatron: Interesting. Note, I'm not modifying the inner query to add `top 100%`, `top N`. I just want to be ready for whatever inner query I receive, e.g. when it has a `TOP .. ORDER BY` clause... – Lukas Eder Sep 23 '13 at 19:53
  • @LukasEder i am no expert, but don't think having an order by is even valid on an inner query without a top or similar to give it meaning.. the other end of the same logic that insists an order to you rank ;) –  Sep 23 '13 at 20:03
  • 1
    Well, that's what I *think* as well. But I'd like to *know* ;-) – Lukas Eder Sep 23 '13 at 20:18
  • Of note is that in your "concrete example" code, it is ambiguous whether the `ORDER BY` applies to only the final query's `TOP` or if it orders the entire set. Try reversing the order of the rows that are `UNIONed` together and see if you get the same result? – ErikE Sep 24 '13 at 23:15
  • @ErikE: In all SQL dialects I'm aware of, `ORDER BY` applies to the outcome of *all* unioned subqueries. In SQL Server, there is ` – Lukas Eder Sep 25 '13 at 05:35
  • @LukasEder If you will examine [this SQL Fiddle](http://sqlfiddle.com/#!6/d41d8/8312) I believe you may see some unexpected results. In your query as is, the `ORDER BY` applies only to the fourth SELECT in the inner query. But when that query is taken out to the top level, the `ORDER BY` functions for BOTH meanings: choosing the particular rows in the final SELECT **as well as** ordering the whole result set. My instinct about there possibly being ambiguity was not wrong. – ErikE Sep 25 '13 at 23:24
  • @ErikE: I still don't think that there's any ambiguity in combining `UNION` and `ORDER BY` :-) Your first example's outer query simply doesn't have any `ORDER BY` clause, and SQL server doesn't give you any guarantees with respect to ordering, then... – Lukas Eder Sep 27 '13 at 08:42
  • @LukasEder You were to notice that the inner `ORDER BY` selected `7` and `6` for the last `UNION ALL`, proving that the `ORDER BY` was being applied to solely the last `SELECT`. In the second query, it did the same thing: selecting `7` and `6` only, and **also** ordering the final result. Thus the final `ORDER BY` in a series of `UNIONs` can function in various ways, so I believe the ambiguity is still present. Note: I had a typo in the fiddle that didn't change its result set but slightly marred it, so [here is an updated SQL Fiddle for you](http://sqlfiddle.com/#!6/d41d8/8312). – ErikE Sep 27 '13 at 17:34

1 Answers1

10

The Row_Number() OVER (ORDER BY (SELECT 1)) trick should NOT be seen as a way to avoid changing the order of underlying data. It is only a means to avoid causing the server to perform an additional and unneeded sort (it may still perform the sort but it's going to cost the minimum amount possible when compared to sorting by a column).

All queries in SQL server ABSOLUTELY MUST have an ORDER BY clause in the outermost query for the results to be reliably ordered in a guaranteed way.

The concept of "retaining original order" does not exist in relational databases. Tables and queries must always be considered unordered until and unless an ORDER BY clause is specified in the outermost query.

You could try the same unordered query 100,000 times and always receive it with the same ordering, and thus come to believe you can rely on said ordering. But that would be a mistake, because one day, something will change and it will not have the order you expect. One example is when a database is upgraded to a new version of SQL Server--this has caused many a query to change its ordering. But it doesn't have to be that big a change. Something as little as adding or removing an index can cause differences. And more: Installing a service pack. Partitioning a table. Creating an indexed view that includes the table in question. Reaching some tipping point where a scan is chosen instead of a seek. And so on.

Do not rely on results to be ordered unless you have said "Server, ORDER BY".

ErikE
  • 48,881
  • 23
  • 151
  • 196
  • I don't entirely agree with your assessment. Take Oracle's `ROWNUM`, for instance. It is quite magic in a way that it guarantees to produce the actual row number on every row. This makes it a very un-relational pseudo-column, as it can be accessed in "semantically challenging" situations, e.g. the `WHERE` clause. A similar "weird" Oracle feature is `FOR UPDATE SKIP LOCKED`, which inverses SQL clause semantics. But SQL isn't 100% relational anyway, so I figured there might be a similar, reliable way of accessing the concrete, materialised tuple order of a given table reference in SQL Server. – Lukas Eder Sep 23 '13 at 20:02
  • Unfortunately, Lukas, I know of no way to get at the "original table order". It's great that Oracle does supply some useful functionality, so perhaps I overstated the case in regards to all relational databases. I do believe, though, that I've gotten it right for SQL Server (and stand ready to be corrected if necessary, of course). – ErikE Sep 23 '13 at 23:55
  • If you need original table order, use a column in the table that specifies an order, and then order by that. In SQL Server, if you order by a clustered index, or by a non-clustered index and only select columns in that index, then it will not perform a sort operation. As such, it's the same as having the "natural" order (i.e., no additional processing work). – siride Sep 24 '13 at 00:40
  • Erik, yes, I'm afraid that you've gotten it right for SQL Server :-) I've noticed how strict it is in imposing a clear semantics on such SQL features, which is probably good, mostly. @siride: I think you might not have entirely understood my question. – Lukas Eder Sep 24 '13 at 05:37