ORDER BY subquery and ROWNUM goes against relational philosophy?

Question

Oracle 's ROWNUM is applied before ORDER BY. In order to put ROWNUM according to a sorted column, the following subquery is proposed in all documentations and texts.

select *
from (
  select *
  from table
  order by price
)
where rownum <= 7

That bugs me. As I understand, table input into FROM is relational, hence no order is stored, meaning the order in the subquery is not respected when seen by FROM.

I cannot remember the exact scenarios but this fact of "ORDER BY has no effect in the outer query" I have read more than once. Examples are in-line subqueries, subquery for INSERT, ORDER BY of PARTITION clause, etc. For example in

OVER (PARTITION BY name ORDER BY salary)

the salary order will not be respected in outer query, and if we want salary to be sorted at outer query output, another ORDER BY need to be added in the outer query.

Some insights from everyone on why the relational property is not respected here and order is stored in the subquery ?

Neither ROWNUM nor ORDER BY are relational concepts. – Jeffrey Kemp Apr 26 '16 at 07:15 — Jeffrey Kemp, Apr 26 '16 at 07:15

score 9 · Accepted Answer · answered Apr 20 '16 at 11:52

9

The ORDER BY in this context is in effect Oracle's proprietary syntax for generating an "ordered" row number on a (logically) unordered set of rows. This is a poorly designed feature in my opinion but the equivalent ISO standard SQL ROW_NUMBER() function (also valid in Oracle) may make it clearer what is happening:

select *
from (
  select ROW_NUMBER() OVER (ORDER BY price) rn, *
  from table
) t
where rn <= 7;

In this example the ORDER BY goes where it more logically belongs: as part of the specification of a derived row number attribute. This is more powerful than Oracle's version because you can specify several different orderings defining different row numbers in the same result. The actual ordering of rows returned by this query is undefined. I believe that's also true in your Oracle-specific version of the query because no guarantee of ordering is made when you use ORDER BY in that way.

It's worth remembering that Oracle is not a Relational DBMS. In common with other SQL DBMSs Oracle departs from the relational model in some fundamental ways. Features like implicit ordering and DISTINCT exist in the product precisely because of the non-relational nature of the SQL model of data and the consequent need to work around keyless tables with duplicate rows.

answered Apr 20 '16 at 11:52

nvogel

24,981
1
44
82

1

The [Oracle documentation for ROWNUM](http://docs.oracle.com/cd/E11882_01/server.112/e41084/pseudocolumns009.htm#SQLRF00255) even says ROW_NUMBER() is better *8-) – Alex Poole Apr 20 '16 at 12:09
@Kenny: Excellent answer; writing mostly to tell you (Kenny) that I feel exactly the same way about this issue. With that said, I decided to learn Oracle anyway (even though it's just for my own satisfaction for now, I don't work in the field in any way). I personally always use row_number() as sqlvogel has shown, and exactly for the reason you brought up. As you work more with Oracle you will find many reasons to be upset; to me the biggest issue is their treatment of NULL strings: a zero-length string is the same as NULL and vice versa. Perhaps Oracle will fix at least that in the future... – Apr 20 '16 at 12:34
@mathguy [`VARCHAR` vs `VARCHAR2`](http://stackoverflow.com/questions/1171196/what-is-the-difference-between-varchar-and-varchar2) – MT0 Apr 20 '16 at 12:52
@sqlvogel Thanks for the great answer. Some further remarks : 1/ Can you elaborate the part on _implicit ordering and DISTINCT_ because other DBMS have it. _ORDER BY_ by itself is already not relational so I am not sure I got what you meant about the deviation from relational model. 2/I think _SELECT *, col1 FROM T_ will not work. Should be at least _SELECT T.*, col1 FROM T_ . * dont mix up without table reference. – Kenny Apr 20 '16 at 12:57
@mathguy : I know right. That's one good thing about ambidexterity I think. You see both worlds so _weird things_ suddenly appear in one or another. I started with MS SQL which never stop emphasizing on the relational model. Great books, those of itzik ben gan. – Kenny Apr 20 '16 at 13:03
1

@Kenny, the ALL and DISTINCT clauses (include or exclude duplicate rows) are a SQL feature but are unnecessary under the relational model because relational tables and query results never contain duplicate rows. SQLDBMS <> RDBMS. See Hugh Darwen's [Askew Wall](http://www.dcs.warwick.ac.uk/~hugh/#CS319) presentation. – nvogel Apr 20 '16 at 21:27
@sqlvogel : Could it be that the inner query _(SELECT *)_ actually include the pseudocolumns _ROWNUM_ and _ROWID_ beside the real table columns, and those are passed forward to the outer query therefore the order is kept ? Or ROWNUM only comes in action when referred to ? – Kenny Apr 21 '16 at 09:54
The DISTINCT/ALL doesn't only apply to complete rows though, the select list can be a subset of columns which does have duplicates; something like `select distinct hire_date from employees` doesn't mean the relational model is broken? (I'm not arguing your general point, just not sure this part makes sense to me) – Alex Poole Apr 21 '16 at 09:55
1

In relational algebra your SELECT statement is equivalent to a projection: π(hire_date)employees. Projections always return distinct tuples - distinct hire dates in this case. No, that does not mean the RM is broken. – nvogel Apr 21 '16 at 10:29

Alex Poole · Answer 2 · 2016-04-20T13:08:49.853

Not surprisingly really, Oracle treats this as a bit of a special case. You can see that from the execution plan. With the naive (incorrect/indeterminate) version of the limit that crops up sometimes, you get SORT ORDER BY and COUNT STOPKEY operations:

select *
from my_table
where rownum <= 7
order by price;

--------------------------------------------------------------------------------          
| Id  | Operation           | Name     | Rows  | Bytes | Cost (%CPU)| Time     |          
--------------------------------------------------------------------------------          
|   0 | SELECT STATEMENT    |          |     1 |    13 |     3  (34)| 00:00:01 |          
|   1 |  SORT ORDER BY      |          |     1 |    13 |     3  (34)| 00:00:01 |          
|*  2 |   COUNT STOPKEY     |          |       |       |            |          |          
|   3 |    TABLE ACCESS FULL| MY_TABLE |     1 |    13 |     2   (0)| 00:00:01 |          
--------------------------------------------------------------------------------          

Predicate Information (identified by operation id):                                       
---------------------------------------------------                                       

   2 - filter(ROWNUM<=7)

If you just use an ordered subquery, with no limit, you only get the SORT ORDER BY operation:

select *
from (
  select *
  from my_table
  order by price
);

-------------------------------------------------------------------------------           
| Id  | Operation          | Name     | Rows  | Bytes | Cost (%CPU)| Time     |           
-------------------------------------------------------------------------------           
|   0 | SELECT STATEMENT   |          |     1 |    13 |     3  (34)| 00:00:01 |           
|   1 |  SORT ORDER BY     |          |     1 |    13 |     3  (34)| 00:00:01 |           
|   2 |   TABLE ACCESS FULL| MY_TABLE |     1 |    13 |     2   (0)| 00:00:01 |           
-------------------------------------------------------------------------------

With the usual subquery/ROWNUM construct you get something different,

select *
from (
  select *
  from my_table
  order by price
)
where rownum <= 7;

------------------------------------------------------------------------------------      
| Id  | Operation               | Name     | Rows  | Bytes | Cost (%CPU)| Time     |      
------------------------------------------------------------------------------------      
|   0 | SELECT STATEMENT        |          |     1 |    13 |     3  (34)| 00:00:01 |      
|*  1 |  COUNT STOPKEY          |          |       |       |            |          |      
|   2 |   VIEW                  |          |     1 |    13 |     3  (34)| 00:00:01 |      
|*  3 |    SORT ORDER BY STOPKEY|          |     1 |    13 |     3  (34)| 00:00:01 |      
|   4 |     TABLE ACCESS FULL   | MY_TABLE |     1 |    13 |     2   (0)| 00:00:01 |      
------------------------------------------------------------------------------------      

Predicate Information (identified by operation id):                                       
---------------------------------------------------                                       

   1 - filter(ROWNUM<=7)                                                                  
   3 - filter(ROWNUM<=7)

The COUNT STOPKEY operation is still there for the outer query, but the inner query (inline view, or derived table) now has a SORT ORDER BY STOPKEY instead of the simple SORT ORDER BY. This is all hidden away in the internals so I'm speculating, but it looks like the stop key - i.e. the row number limit - is being pushed into the subquery processing, so in effect the subquery may only end up with seven rows anyway - though the plan's ROWS value doesn't reflect that (but then you get the same plan with a different limit), and it still feels the need to apply the COUNT STOPKEY operation separately.

Tom Kyte covered similar ground in an Oracle Magazine article, when talking about "Top- N Query Processing with ROWNUM" (emphasis added):

There are two ways to approach this:
- Have the client application run that query and fetch just the first N rows.
- Use that query as an inline view, and use ROWNUM to limit the results, as in SELECT * FROM ( your_query_here ) WHERE ROWNUM <= N.

The second approach is by far superior to the first, for two reasons. The lesser of the two reasons is that it requires less work by the client, because the database takes care of limiting the result set. The more important reason is the special processing the database can do to give you just the top N rows. Using the top- N query means that you have given the database extra information. You have told it, "I'm interested only in getting N rows; I'll never consider the rest." Now, that doesn't sound too earth-shattering until you think about sorting—how sorts work and what the server would need to do.

... and then goes on to outline what it's actually doing, rather more authoritatively than I can.

Interestingly I don't think the order of the final result set is actually guaranteed; it always seems to work, but arguably you should still have an ORDER BY on the outer query too to make it complete. It looks like the order isn't really stored in the subquery, it just happens to be produced like that. (I very much doubt that will ever change as it would break too many things; this ends up looking similar to a table collection expression which also always seems to retain its ordering - breaking that would stop dbms_xplan working though. I'm sure there are other examples.)

Just for comparison, this is what the ROW_NUMBER() equivalent does:

select *
from (
  select ROW_NUMBER() OVER (ORDER BY price) rn, my_table.*
  from my_table
) t
where rn <= 7;

-------------------------------------------------------------------------------------     
| Id  | Operation                | Name     | Rows  | Bytes | Cost (%CPU)| Time     |     
-------------------------------------------------------------------------------------     
|   0 | SELECT STATEMENT         |          |     2 |    52 |     4  (25)| 00:00:01 |     
|*  1 |  VIEW                    |          |     2 |    52 |     4  (25)| 00:00:01 |     
|*  2 |   WINDOW SORT PUSHED RANK|          |     2 |    26 |     4  (25)| 00:00:01 |     
|   3 |    TABLE ACCESS FULL     | MY_TABLE |     2 |    26 |     3   (0)| 00:00:01 |     
-------------------------------------------------------------------------------------     

Predicate Information (identified by operation id):                                       
---------------------------------------------------                                       

   1 - filter("RN"<=7)                                                                    
   2 - filter(ROW_NUMBER() OVER ( ORDER BY "PRICE")<=7)

Thanks for the input Alex. A thought that crosses my mind : Could it be that the inner query _(SELECT *)_ actually include the pseudocolumns _ROWNUM_ and _ROWID_ beside the real table columns, and those are passed forward to the outer query therefore the order is kept ? Or ROWNUM only comes in action when referred to ? — Kenny, Apr 21 '16 at 09:53
@Kenny - the inner query has its own `ROWNUM` psuedocolumn but as you noticed that is generated before the order-by is applied. You can look at it, and even give it an alias so the outer query can see it. But it won't necessarily (logically) have the same value as `ROWNUM` in the outer query. For example, try: `select t.*, rownum from (select my_table.*, rownum as rn from my_table order by price) t where rownum <= 7;` — Alex Poole, Apr 21 '16 at 10:01

score 1 · Answer 3 · answered Aug 24 '16 at 07:41

Adding to sqlvogel's good answer :

"As I understand, table input into FROM is relational"

No, table input into FROM is not relational. It is not relational because "table input" are tables and tables are not relations. The myriads of quirks and oddities in SQL eventually all boil down to that simple fact : the core building brick in SQL is the table, and a table is not a relation. To sum up the differences :

Tables can contain duplicate rows, relations cannot. (As a consequence, SQL offers bag algebra, not relational algebra. As another consequence, it is as good as impossible for SQL to even define equality comparison for its most basic building brick !!! How would you compare tables for equality given that you might have to deal with duplicate rows ?)

Tables can contain unnamed columns, relations cannot. SELECT X+Y FROM ... As a consequence, SQL is forced into "column identity by ordinal position", and as a consequence of that, you get all sorts of quirks, e.g. in SELECT A,B FROM ... UNION SELECT B,A FROM ...

Tables can contain duplicate column names, relations cannot. A.ID and B.ID in a table are not distinct column names. The part before the dot is not part of the name, it is a "scope identifier", and that scope identifier "disappears" once you're "outside the SELECT" it appears/is introduced in. You can verify this with a nested SELECT : SELECT A.ID FROM (SELECT A.ID, B.ID FROM ...). It won't work (unless your particular implementation departs from the standard in order to make it work).

Various SQL constructs leave people with the impression that tables do have an ordering to rows. The ORDER BY clause, obviously, but also the GROUP BY clause (which can be made to work only by introducing rather dodgy concepts of "intermediate tables with rows grouped together"). Relations simply are not like that.

Tables can contain NULLs, relations cannot. This one has been beaten to death.

There should be some more, but I don't remember them off the tip of the hat.

ORDER BY subquery and ROWNUM goes against relational philosophy?

3 Answers3

Linked