Window functions SORT costly, can we overcome it?

Question

My Requirement: Identify top 10 accounts of a DEPT_NUM , ordered by the account number in ascending order.

Query:

SELECT * FROM
(
  select acctnum,dept_num,row_number() OVER (PARTITION BY DEPT_NUM ORDER BY ACCTNUM) as row_identifier
   FROM MYTABLE
)
WHERE row_identifier between 1 and 10;

Trace:

    7532 rows selected.


Execution Plan
----------------------------------------------------------
Plan hash value: 1480074522

--------------------------------------------------------------------------------------------
| Id  | Operation                | Name    | Rows  | Bytes |TempSpc| Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT         |         |   577K|    15M|       |  3855   (1)| 00:00:47 |
|*  1 |  VIEW                    |         |   577K|    15M|       |  3855   (1)| 00:00:47 |
|*  2 |   WINDOW SORT PUSHED RANK|         |   577K|  7890K|    13M|  3855   (1)| 00:00:47 |
|   3 |    INDEX FAST FULL SCAN  | IMTAB05 |   577K|  7890K|       |   987   (1)| 00:00:12 |
--------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter("ROW_IDENTIFIER">=1 AND "ROW_IDENTIFIER"<=5)
   2 - filter(ROW_NUMBER() OVER ( PARTITION BY "DEPT_NUM" ORDER BY "ACCTNUM")<=5)


Statistics
----------------------------------------------------------
          0  recursive calls
          2  db block gets
       4298  consistent gets
          0  physical reads
          0  redo size
     144367  bytes sent via SQL*Net to client
        486  bytes received via SQL*Net from client
          3  SQL*Net roundtrips to/from client
          1  sorts (memory)
          0  sorts (disk)
       7532  rows processed

Index:

The index scan says, INDEX STORAGE on a column DEPT_NUM.

Forcing Full Table scan made cost from 3855 to 11092

Total number of rows in the table is 632667;

All the above are test region results. Production actually has twice the amount.

My Database is Exadata, Quarter RAC. Running Oracle 11g R2. The databse is powerful enough to execute it instantly, But DBA were reluctant on the tempSpc of 13M. Business reported the frequency of this report would be 4 times an hour. And Main thing is, this table gets a Lot of real time inserts/updates

Can we improvise the process like
1) Increasing the PGA for a session?(Not sure, if it really possible?)
2) Will An additional index help?

Just want some different eyes to look on this, as our group is totally focusing on the DBA parameters only.

Thanks for any kind of suggestions!

I am not sure but would row_identifier >= 10; work better here because it doesn't have to evaluate to two different things. Since the between statement is essentially doing a <= 1 and a >=10. — Phil, Jul 07 '14 at 21:50
You might consider asking this at http://dba.stackexchange.com/ as well for DBA advice. — Michael McGriff, Jul 07 '14 at 22:02
@ConradFrix The prod environment pointed the usage as `24M`. We haven't concluded our threshold number yet. Shall update the same sooner! Our e-commerce app uses the same database, and it is optimised for `OLTP` — Maheswaran Ravisankar, Jul 07 '14 at 22:03
The requirement says "a DEPTNUM" but the query returns "all DEPTNUM". Does the application really display thousands of rows or is there a bind variable somewhere? — Jon Heller, Jul 08 '14 at 04:51
Sorry @jonearles , it is all the departments.. it is a report actually. There are n departments , which is in another master table. Haven't thought of running it in batch for every department number. — Maheswaran Ravisankar, Jul 08 '14 at 04:53
How is this "top 10" when all you are doing in a sort by acctnum, typically "top accounts" are determined by how much they buy from you. — Paul Maxwell, Jul 08 '14 at 06:13
TOP is something we define it as critical customers. We have a naming convention. (starting with 0 for most critical and so on) sorted ascending the first would be the most critical. the account generation process decides it. — Maheswaran Ravisankar, Jul 08 '14 at 06:45
I think your dba is being overly conservative. That said if this really is a concern I don't think the only things you can do is restructure your data, use an older style cursor solution for top n per groups or use resource manager to cap the resources. — Conrad Frix, Jul 08 '14 at 21:25

Jon Heller · Accepted Answer · 2014-07-09T04:57:07.420

Analytic function performance may depend on the index column order. Changing the index from (ACCTNUM,DEPT_NUM) to (DEPT_NUM,ACCTNUM) may lower the cost and remove the need for temporary tablespace.

partition by COL_2 order by COL_1 => INDEX FAST FULL SCAN|WINDOW SORT PUSHED RANK
partition by COL_1 order by COL_2 => INDEX FULL SCAN|WINDOW NOSORT

INDEX FAST FULL SCAN uses faster multi-block IO but it also requires sorting the data and possibly temporary tablespace for the sort area.

INDEX FULL SCAN uses slower single-block IO but it returns the data in order and avoids sorting.

Sample schema and data

--drop table mytable;
create table mytable(dept_num number not null, acctnum number not null
    ,a number, b number, c number, d number, e number);
insert into mytable
select 1 dept_num, 1 acctnum, 0,0,0,0,0 from dual union all
select 1 dept_num, 2 acctnum, 0,0,0,0,0 from dual union all
select 1 dept_num, 3 acctnum, 0,0,0,0,0 from dual union all
select 2 dept_num, 1 acctnum, 0,0,0,0,0 from dual union all
select 2 dept_num, 2 acctnum, 0,0,0,0,0 from dual union all
select 3 dept_num, 1 acctnum, 0,0,0,0,0 from dual;
--Create 600K similar rows.
insert into mytable
  select dept_num + rownumber*3, acctnum, a,b,c,d,e
  from mytable
  cross join (select level rownumber from dual connect by level <= 100000);
begin
    dbms_stats.gather_table_stats(user, 'mytable');
end;
/

(ACCTNUM,DEPT_NUM) = WINDOW SORT PUSHED RANK

create index mytable_idx on mytable(acctnum, dept_num);

explain plan for
select dept_num, acctnum from
(
    select dept_num, acctnum
        ,row_number() over (partition by dept_num order by acctnum) as row_identifier
    from mytable
)
where row_identifier between 1 and 10;

select * from table(dbms_xplan.display);

Plan hash value: 952182109

------------------------------------------------------------------------------------------------
| Id  | Operation                | Name        | Rows  | Bytes |TempSpc| Cost (%CPU)| Time     |
------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT         |             |   600K|    22M|       |  1625   (3)| 00:00:23 |
|*  1 |  VIEW                    |             |   600K|    22M|       |  1625   (3)| 00:00:23 |
|*  2 |   WINDOW SORT PUSHED RANK|             |   600K|  4687K|  9424K|  1625   (3)| 00:00:23 |
|   3 |    INDEX FAST FULL SCAN  | MYTABLE_IDX |   600K|  4687K|       |   239   (3)| 00:00:04 |
------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter("ROW_IDENTIFIER">=1 AND "ROW_IDENTIFIER"<=10)
   2 - filter(ROW_NUMBER() OVER ( PARTITION BY "DEPT_NUM" ORDER BY "ACCTNUM")<=10)

(DEPT_NUM,ACCTNUM) = WINDOW NOSORT

drop index mytable_idx;
create index mytable_idx on mytable(dept_num, acctnum);

explain plan for
select dept_num, acctnum from
(
    select dept_num, acctnum
        ,row_number() over (partition by dept_num order by acctnum) as row_identifier
    from mytable
)
where row_identifier between 1 and 10;

select * from table(dbms_xplan.display);

Plan hash value: 1773829932

---------------------------------------------------------------------------------
| Id  | Operation         | Name        | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |             |   600K|    22M|   792   (2)| 00:00:12 |
|*  1 |  VIEW             |             |   600K|    22M|   792   (2)| 00:00:12 |
|*  2 |   WINDOW NOSORT   |             |   600K|  4687K|   792   (2)| 00:00:12 |
|   3 |    INDEX FULL SCAN| MYTABLE_IDX |   600K|  4687K|   792   (2)| 00:00:12 |
---------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter("ROW_IDENTIFIER">=1 AND "ROW_IDENTIFIER"<=10)
   2 - filter(ROW_NUMBER() OVER ( PARTITION BY "DEPT_NUM" ORDER BY 
              "ACCTNUM")<=10)

Thanks for stopping by to help out. And your solution make sense, I shall create the indexes at my workplace and let you know the results! — Maheswaran Ravisankar, Jul 09 '14 at 13:35

Window functions SORT costly, can we overcome it?

1 Answers1

Linked