1

I have a query which uses ROW_NUMBER(). I have something like this:

ROW_NUMBER() OVER (ORDER BY publish_date DESC) rnum

The query runs pretty fast. However, if I add any reference to the "rnum" column, the query slows to a crawl. So, it appears that just having ROW_NUMBER() is not the issue, but when I use the "rnum" in the actual query, it crawls for like 30 seconds.

Any thoughts?

For Reference, here is the query:

  WITH aquire AS (
    SELECT rtnum, trans_id, source, provider, publish_date, story_link, industry_name, sector_name, subject, teaser, tickers
    FROM (SELECT d.trans_id, d.source, 'AquireMedia' AS provider,
                 d.trans_time AS publish_date, '/research/get_news.php?id=' || d.trans_id AS story_link,
                 i.name AS industry_name, s.sector_name, d.headline AS subject, NULL AS teaser,
                 NEWS.NEWS_FUNCTIONS.CONCATENATE_TICKERS(d.trans_id,'AQUIREMEDIA') AS tickers,
                 ROW_NUMBER() OVER (PARTITION BY d.trans_id ORDER BY d.trans_time DESC) as rtnum
          FROM   story_descriptions_3m d, story_tickers_3m t, uber_master_mv m, industry i, ind_sector ix, sectors s, comp_ind c
          WHERE  d.trans_id = t.trans_id
            AND  t.m_ticker = m.m_ticker
            AND  t.m_ticker = c.m_ticker(+)
            AND  c.ind_code = i.ind_code(+)
            AND  i.ind_code = ix.ind_code(+)
            AND  ix.sector_id = s.sector_id(+)  AND s.sector_id = 10 )
    WHERE rtnum = 1),
partner AS (
  SELECT rtnum, trans_id, source, provider, publish_date, story_link, industry_name, sector_name, subject, teaser, tickers
  FROM (SELECT CAST(n.story_id AS VARCHAR2(20)) trans_id, n.provider AS source, 'Partner News' AS provider,
               n.story_date AS publish_date, n.link AS story_link, i.name AS industry_name, s.sector_name, n.title AS subject,
               CAST(substr(n.teaser,1,4000) AS VARCHAR2(4000)) AS teaser, NEWS.NEWS_FUNCTIONS.CONCATENATE_TICKERS(n.story_id,'OTHER') AS tickers,
               ROW_NUMBER() OVER (PARTITION BY n.story_id ORDER BY n.story_date DESC) as rtnum
        FROM   news_stories_3m n, news_stories_lookup_3m t, comp_ind c, uber_master_mv m, industry i, ind_sector ix, sectors s
        WHERE  t.story_id = n.story_id
          AND  t.ticker   = m.ticker
          AND  m.m_ticker = c.m_ticker(+)
          AND  c.ind_code = i.ind_code(+)
          AND  i.ind_code = ix.ind_code(+)
          AND  ix.sector_id = s.sector_id(+)  AND s.sector_id = 10 )
   WHERE rtnum = 1)
  SELECT  trans_id, source, provider,
         TO_CHAR(publish_date,'MM/DD/YYYY HH24:MI:SS') AS publish_date,
         UNIX_TIMESTAMP(publish_date) AS timestamp,
         story_link, industry_name, sector_name, subject, teaser, tickers
  FROM (SELECT trans_id, source, provider, publish_date, story_link, industry_name, sector_name, subject, teaser, tickers,
                         ROW_NUMBER() OVER (ORDER BY publish_date DESC) rnum
             FROM (SELECT trans_id, source, provider, publish_date, story_link, industry_name, sector_name, subject, teaser, tickers
                        FROM   aquire WHERE rtnum <= 5
                        UNION ALL
                        SELECT trans_id, source, provider, publish_date, story_link, industry_name, sector_name, subject, teaser, tickers
                        FROM   partner WHERE rtnum <= 5)) 
WHERE rnum BETWEEN 1 AND 1 * 5;
Landon Statis
  • 683
  • 2
  • 10
  • 25
  • 2
    The optimizer is pretty good at making trivial changes that save potentially a lot of time. If you define `rnum` as `row_number(...)` (or anything else for that matter) in a subquery, but then you don't reference it in the main query, the optimizer simply ignores `rnum` in the subquery. You are not having a hallucination - what you are noticing is the optimizer doing its job. Now, perhaps, you will change your question: "suppose I do need `rnum`, is there any way to make the query faster"? The answer is "probably, but it depends on what your query is doing" - you didn't tell us that. –  Jun 22 '21 at 04:26
  • It's somewhat of a long query. I can post it, maybe someone might have an idea....... – Landon Statis Jun 22 '21 at 04:32
  • Your question in its current form is almost too broad. If you have performance issues, you should run `EXPLAIN` on the above query and find the bottlenecks. Once you have done this, then Stack Overflow is a good place to ask with some suggestions on how to improve query performance. – Tim Biegeleisen Jun 22 '21 at 04:34
  • Performance, as long as I do not reference the "rnum" column, is fine. But, if I reference the "rnum" column, whether to use it in criteria or just to return it as a column, then the query is very slow........ – Landon Statis Jun 22 '21 at 04:40
  • If it's to "return it as a column" then there is no possible further optimization - Oracle already has all the information and it optimizes better than we can. If you reference `rnum` in conditions in a main query (for example a condition like `where rnum = 1`) then there are ways to rewrite the query to use other means, not `row_number()`, to achieve the same result with better performance; but exactly how to do that depends on what the query must do, and again - we don't know what that is. –  Jun 22 '21 at 05:13
  • Basically, the RNUM will be used to order the rows, and then choose which range to return, based on other parameters. But you can see in the last line, WHERE rnum BETWEEN 1 AND 1 * 5;, where it is doing some range selecting, based on how the ROW_NUMBER() did the ordering........ – Landon Statis Jun 22 '21 at 06:01
  • Did you try to use the [row_limiting_clause](https://docs.oracle.com/en/database/oracle/oracle-database/21/sqlrf/SELECT.html#GUID-CFA006CA-6FF1-4972-821E-6996142A51C6)? This is the Oracle *native* way to do what you simulate with the `ROW_NUMBER()` – Marmite Bomber Jun 22 '21 at 06:33
  • easiest way to find out is by looking at the plans with and without. If you find a SORT operation added, it explains the difference in performance. – gsalem Jun 22 '21 at 06:58

1 Answers1

0

Let simulate your query on a simple example to demonstrate and explain that you encounter expactable results.

Sample Data

create table tab1 as
select rownum id, lpad('x',3000,'y') pad from dual connect by level <= 1000000;

Now if you run a query below in your IDE, you will instantly see the first page of the result set.

Note, you define the row_numberbut do not use it.

select id, pad from (
 select id, pad,
  row_number() over (order by id) as rnum
 from tab1
)

The answert is in the execution plan below

--------------------------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      |  1000K|  2866M|   135K  (1)| 00:00:06 |
|   1 |  TABLE ACCESS FULL| TAB1 |  1000K|  2866M|   135K  (1)| 00:00:06 |
--------------------------------------------------------------------------

you see that no sorting and filtering is performed, the row_number is simple ignored.

This (fetching only few initial rows and no sorting) explains why the query performs.

Contrary if you constraint on the row_number as follows

SQL> select id, pad from (
  2   select id, pad,
  3    row_number() over (order by id) as rnum
  4   from tab1
  5  ) where rnum between 1 and 5
  6  ;

Elapsed: 00:00:07.80

You observe respectable elapsed time. Again the execution plan provides the answer.

See here how to get the execution plan for your query.

-----------------------------------------------------------------------------------------
| Id  | Operation                | Name | Rows  | Bytes |TempSpc| Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT         |      |     5 |  7640 |       |   762K  (1)| 00:00:30 |
|*  1 |  VIEW                    |      |     5 |  7640 |       |   762K  (1)| 00:00:30 |
|*  2 |   WINDOW SORT PUSHED RANK|      |  1000K|  2866M|  3906M|   762K  (1)| 00:00:30 |
|   3 |    TABLE ACCESS FULL     | TAB1 |  1000K|  2866M|       |   135K  (1)| 00:00:06 |
-----------------------------------------------------------------------------------------
 
Predicate Information (identified by operation id):
---------------------------------------------------
 
   1 - filter("RNUM">=1 AND "RNUM"<=5)
   2 - filter(ROW_NUMBER() OVER ( ORDER BY "ID")<=5)

The consequence is that now you must now go throu all records (in your case perform all the joins), which breaks the performance.

To prove it, run simple your performat query with a fetch all option or with added order by clause. It is quite possible that you will get the same non-performant result as in your second query.

Final Remark

Instead of the ROW_NUMBER() you may use the row_limiting_clause

Pass the ordering column from the row_number in the order by clause and use offset and fetch first to limit the result.

select id, pad from (
 select id, pad
 from tab1
) order by id
fetch first 5 rows only;

Under the cover you'll see the identical execution plan using the WINDOW SORT PUSHED RANK as above.

Marmite Bomber
  • 19,886
  • 4
  • 26
  • 53