2

Consider table t:

x  y
1  2
3  4

And the somewhat naive query

SELECT
    ROW_NUMBER() OVER ( ORDER BY x ) row_num,
    x,
    y
FROM
    t
WHERE
    x = 3;

Which (understandably) outputs

row_num x  y
1       3  4

However, I want

row_num x  y
2       3  4

(row_num should be "calculated" over the unfiltered table, so before the WHERE clause is applied)

I tried

SELECT
    ROW_NUMBER() OVER ( ORDER BY t.x ) row_num,
    t1.x,
    t1.y 
FROM
    t, t as t1
WHERE
    t.x = t1.x AND t1.x = 3;

But I get the exact same output.

Will ROW_NUMBER always execute last, or is there a way to force it to execute first, then apply the WHERE clause?

DeepSpace
  • 78,697
  • 11
  • 109
  • 154

2 Answers2

2

Window functions being analytical, ad-hoc type of calculations run on the current scope of the retrieved data. Unlike lexical order of most programming languages, SQL runs in logical order and not in the order commands are declared. Hence, FROM, JOIN, WHERE clauses are usually the first steps run in an SQL query. Therefore, once the data is retrieved, then window functions are calculated on that scope of data.

In fact SQLite's window function docs intimate this (emphasis added):

A window function is an SQL function where the input values are taken from a "window" of one or more rows in the results set of a SELECT statement.

Instead of the self-join solution you arrived at (which uses the outmoded implicit and not explicit join), you could have resolved your needs with CTE or subquery:

WITH sub AS (
  SELECT 
      ROW_NUMBER() OVER ( ORDER BY x ) AS row_num
      , x
      , y
  FROM
      t
)

SELECT
     sub.row_num
     , sub.x
     , sub.y
FROM 
    sub
WHERE 
    sub.x = 3;
SELECT sub.row_num
       , sub.x
       , sub.y
FROM (
  SELECT
      ROW_NUMBER() OVER ( ORDER BY x ) AS row_num
      , x
      , y
  FROM
      t
  ) sub
WHERE 
    sub.x = 3;
Parfait
  • 104,375
  • 17
  • 94
  • 125
  • Thanks! Assuming this query will execute every time an API endpoint is hit, is one of the approaches you suggested preferable (performance-wise) over the other? – DeepSpace Oct 20 '20 at 15:35
  • 1
    Consider benchmarking between the two within your environment. There is an ongoing debate of CTE vs subquery in performance. Hence, there is no rule of thumb. Though CTEs are arguably more readable. – Parfait Oct 20 '20 at 15:54
1

If all you want is 1 row from the table (provided that x is unique) then you can do it without selecting from a subquery or cte:

SELECT ROW_NUMBER() OVER (ORDER BY x) row_num,
       x,
       y
FROM t
ORDER BY x = 3 DESC LIMIT 1;

See the demo.
Results:

> row_num |  x |  y
> ------: | -: | -:
>       2 |  3 |  4
forpas
  • 160,666
  • 10
  • 38
  • 76