High performance approach to greatest-n-per-group SQL query

Question

I'm attempting to build an infrastructure for quickly running regressions on demand, pulling apache requests from a database that contains all historic activity on our webservers. To improve coverage by making sure that we still regress requests from our smaller clients, I'd like to ensure a distribution of requests by retrieving at most n (for the sake of this question, say 10) requests for each client.

I found a number of similar questions answered here, the closest of which seemed to be SQL query to return top N rows per ID across a range of IDs, but the answers were for the most part performance-agnostic solutions that I had already tried. For instance, the row_number() analytic function gets us exactly the data we're looking for:

SELECT
    *
FROM
    (
    SELECT
        dailylogdata.*,
        row_number() over (partition by dailylogdata.contextid order by occurrencedate) rn
    FROM
        dailylogdata
    WHERE
        shorturl in (?)
    )
WHERE
    rn <= 10;

However, given that this table contains millions of entries for a given day and this approach necessitates reading all rows from the index that match our selection criteria in order to apply the row_number analytic function, performance is terrible. We end up selecting nearly a million rows, only to throw out the vast majority of them because their row_number exceeded 10. Stats from executing the above query:

|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
|| Id  | Operation                            | Name                    | Starts | E-Rows | A-Rows |   A-Time   | Buffers | Reads  | Writes |  OMem |  1Mem | Used-Mem | Used-Tmp||
|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
||   0 | SELECT STATEMENT                     |                         |      1 |        |  12222 |00:09:08.94 |     895K|    584K|    301 |       |       |          |         ||
||*  1 |  VIEW                                |                         |      1 |   4427K|  12222 |00:09:08.94 |     895K|    584K|    301 |       |       |          |         ||
||*  2 |   WINDOW SORT PUSHED RANK            |                         |      1 |   4427K|  13536 |00:09:08.94 |     895K|    584K|    301 |  2709K|   743K|   97M (1)|    4096 ||
||   3 |    PARTITION RANGE SINGLE            |                         |      1 |   4427K|    932K|00:22:27.90 |     895K|    584K|      0 |       |       |          |         ||
||   4 |     TABLE ACCESS BY LOCAL INDEX ROWID| DAILYLOGDATA            |      1 |   4427K|    932K|00:22:27.61 |     895K|    584K|      0 |       |       |          |         ||
||*  5 |      INDEX RANGE SCAN                | DAILYLOGDATA_URLCONTEXT |      1 |  17345 |    932K|00:00:00.75 |    1448 |      0 |      0 |       |       |          |         ||
|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
|                                                                                                                                                                                 |
|Predicate Information (identified by operation id):                                                                                                                              |
|---------------------------------------------------                                                                                                                              |
|                                                                                                                                                                                 |
|   1 - filter("RN"<=:SYS_B_2)                                                                                                                                                    |
|   2 - filter(ROW_NUMBER() OVER ( PARTITION BY "DAILYLOGDATA"."CONTEXTID" ORDER BY "OCCURRENCEDATE")<=:SYS_B_2)                                                                  |
|   5 - access("SHORTURL"=:P1)                                                                                                                                                    |
|                                                                                                                                                                                 |
+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

However, if instead we only query for the first 10 results for a specific contextid, we can execute this dramatically faster:

SELECT
    *
FROM
    (
    SELECT
        dailylogdata.*
    FROM
        dailylogdata
    WHERE
        shorturl in (?)
        and contextid = ?
    )
WHERE
    rownum <= 10;

Stats from running this query:

|-------------------------------------------------------------------------------------------------------------------------|
|| Id  | Operation                           | Name                    | Starts | E-Rows | A-Rows |   A-Time   | Buffers ||
|-------------------------------------------------------------------------------------------------------------------------|
||   0 | SELECT STATEMENT                    |                         |      1 |        |     10 |00:00:00.01 |      14 ||
||*  1 |  COUNT STOPKEY                      |                         |      1 |        |     10 |00:00:00.01 |      14 ||
||   2 |   PARTITION RANGE SINGLE            |                         |      1 |     10 |     10 |00:00:00.01 |      14 ||
||   3 |    TABLE ACCESS BY LOCAL INDEX ROWID| DAILYLOGDATA            |      1 |     10 |     10 |00:00:00.01 |      14 ||
||*  4 |     INDEX RANGE SCAN                | DAILYLOGDATA_URLCONTEXT |      1 |      1 |     10 |00:00:00.01 |       5 ||
|-------------------------------------------------------------------------------------------------------------------------|
|                                                                                                                         |
|Predicate Information (identified by operation id):                                                                      |
|---------------------------------------------------                                                                      |
|                                                                                                                         |
|   1 - filter(ROWNUM<=10)                                                                                                |
|   4 - access("SHORTURL"=:P1 AND "CONTEXTID"=TO_NUMBER(:P2))                                                             |
|                                                                                                                         |
+-------------------------------------------------------------------------------------------------------------------------+

In this instance, Oracle is smart enough to stop retrieving data after getting 10 results. I could gather a complete set of contextids and programatically generate a query consisting of one instance of this query for each contextid and union all the whole mess together, but given the sheer number of contextids, we might run into an internal Oracle limitation, and even if not, this approach reeks of kludge.

Does anyone know of an approach that maintains the simplicity of the first query, while retaining performance commensurate with the second query? Also note that I don't actually care about retrieving a stable set of rows; so long as they satisfy my criteria, they're fine for the purposes of a regression.

Edit: Adam Musch's suggestion did the trick. I'm appending performance results with his changes up here since I can't fit them in a comment response to his answer. I'm also using a larger data set for testing this time, here are the (cached) stats from my original row_number approach for comparison:

|-------------------------------------------------------------------------------------------------------------------------------------------------|
|| Id  | Operation                     | Name              | Starts | E-Rows | A-Rows |   A-Time   | Buffers | Reads  |  OMem |  1Mem | Used-Mem ||
|-------------------------------------------------------------------------------------------------------------------------------------------------|
||   0 | SELECT STATEMENT              |                   |      1 |        |  12624 |00:00:22.34 |    1186K|    931K|       |       |          ||
||*  1 |  VIEW                         |                   |      1 |   1163K|  12624 |00:00:22.34 |    1186K|    931K|       |       |          ||
||*  2 |   WINDOW NOSORT               |                   |      1 |   1163K|   1213K|00:00:21.82 |    1186K|    931K|  3036M|    17M|          ||
||   3 |    TABLE ACCESS BY INDEX ROWID| TWTEST            |      1 |   1163K|   1213K|00:00:20.41 |    1186K|    931K|       |       |          ||
||*  4 |     INDEX RANGE SCAN          | TWTEST_URLCONTEXT |      1 |   1163K|   1213K|00:00:00.81 |    8568 |      0 |       |       |          ||
|-------------------------------------------------------------------------------------------------------------------------------------------------|
|                                                                                                                                                 |
|Predicate Information (identified by operation id):                                                                                              |
|---------------------------------------------------                                                                                              |
|                                                                                                                                                 |
|   1 - filter("RN"<=10)                                                                                                                          |
|   2 - filter(ROW_NUMBER() OVER ( PARTITION BY "CONTEXTID" ORDER BY  NULL )<=10)                                                                 |
|   4 - access("SHORTURL"=:P1)                                                                                                                    |
+-------------------------------------------------------------------------------------------------------------------------------------------------+

I've taken the liberty of boiling down Adam's suggestion a bit; here's the modified query...

select
    *
from
    twtest
where
    rowid in (
    select
            rowid
    from (
            select
                    rowid,
                    shorturl,
                    row_number() over (partition by shorturl, contextid
                                                      order by null) rn
            from
                    twtest
    )
    where rn <= 10
    and shorturl in (?)
);

...and stats from its (cached) evaluation:

|--------------------------------------------------------------------------------------------------------------------------------------|
|| Id  | Operation                   | Name              | Starts | E-Rows | A-Rows |   A-Time   | Buffers |  OMem |  1Mem | Used-Mem ||
|--------------------------------------------------------------------------------------------------------------------------------------|
||   0 | SELECT STATEMENT            |                   |      1 |        |  12624 |00:00:01.33 |   19391 |       |       |          ||
||   1 |  NESTED LOOPS               |                   |      1 |      1 |  12624 |00:00:01.33 |   19391 |       |       |          ||
||   2 |   VIEW                      | VW_NSO_1          |      1 |   1163K|  12624 |00:00:01.27 |    6770 |       |       |          ||
||   3 |    HASH UNIQUE              |                   |      1 |      1 |  12624 |00:00:01.27 |    6770 |  1377K|  1377K| 5065K (0)||
||*  4 |     VIEW                    |                   |      1 |   1163K|  12624 |00:00:01.25 |    6770 |       |       |          ||
||*  5 |      WINDOW NOSORT          |                   |      1 |   1163K|   1213K|00:00:01.09 |    6770 |   283M|  5598K|          ||
||*  6 |       INDEX RANGE SCAN      | TWTEST_URLCONTEXT |      1 |   1163K|   1213K|00:00:00.40 |    6770 |       |       |          ||
||   7 |   TABLE ACCESS BY USER ROWID| TWTEST            |  12624 |      1 |  12624 |00:00:00.04 |   12621 |       |       |          ||
|--------------------------------------------------------------------------------------------------------------------------------------|
|                                                                                                                                      |
|Predicate Information (identified by operation id):                                                                                   |
|---------------------------------------------------                                                                                   |
|                                                                                                                                      |
|   4 - filter("RN"<=10)                                                                                                               |
|   5 - filter(ROW_NUMBER() OVER ( PARTITION BY "SHORTURL","CONTEXTID" ORDER BY NULL NULL )<=10)                                       |
|   6 - access("SHORTURL"=:P1)                                                                                                         |
|                                                                                                                                      |
|Note                                                                                                                                  |
|-----                                                                                                                                 |
|   - dynamic sampling used for this statement (level=2)                                                                               |
|                                                                                                                                      |
+--------------------------------------------------------------------------------------------------------------------------------------+

As advertised, we're only accessing the dailylogdata table for the fully-filtered rows. I'm concerned that it appears to still be doing a full scan of the urlcontext index based on the number of rows it claims to be selecting (1213K), but given that it's using only using 6770 buffers (this number remains constant even if I increase the number of context-specific results) this may be misleading.

Have you tried other ways/queries that achieve the same resultset (greatest-N-per-group)? — ypercubeᵀᴹ, Feb 28 '12 at 16:50
Your second query does not get "the first 10 rows", it simply gets 10 rows, semi-randomly. In order to make it truly equivalent to one partition of the first query, you need to add an `order by` to the inner query. — Allan, Feb 28 '12 at 17:10
@ypercube for the most part I've tried other analytic functions with similar performance results. I tried some more esoteric stuff like the [first](http://docs.oracle.com/cd/B19306_01/server.102/b14200/functions056.htm#sthref1389) function which wouldn't return as many rows as I needed, but I thought it might perform better than this query. Like the other analytic functions, it scanned way more of the index than needed, however. — Trevor, Feb 28 '12 at 18:08
@Allan I agree that the two queries aren't exactly equivalent; as I mentioned later in the question, I don't actually care about the query returning a stable or sorted set of rows. However, the syntax of the row_number() function requires an `order by` clause, so the query gets it. I'd be perfectly happy without it, were it possible. — Trevor, Feb 28 '12 at 18:14
@Trevor: So, you just want 10 results per `contextid` (from the many, possibly million rows)? How many different `contextid` are there in the table? — ypercubeᵀᴹ, Feb 28 '12 at 18:52
@Trevor: Also, are `shorturl` and `contextid` dependent or independent values - does each `contextid` exist in one and only one `shorturl`? — Adam Musch, Mar 02 '12 at 20:19

score 4 · Accepted Answer · answered Mar 02 '12 at 21:52

This is kind of a janky solution, but it seems to do what you want: shortcut the index scan as soon as possible, and not read data until it's been qualified both by filtering conditioning and top-n query criteria.

Note that it was tested with a shorturl = condition, not a shorturl IN condition.

with rowid_list as
(select rowid
   from (select *
           from (select rowid,
                        row_number() over (partition by shorturl, contextid
                                           order by null) rn
                   from dailylogdata
                )
          where rn <= 10
        )
  where shorturl = ? 
)
select * 
  from dailylogdata
 where rowid in (select rowid from rowid_list)

The with clause grabs the first 10 rowids filtering a WINDOW NOSORT for each unique combination of shorturl and contextid that meets your criteria. Then it loops over that set of rowids, fetching each by rowid.

----------------------------------------------------------------------------------------------------
| Id  | Operation                   | Name                 | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |                      |     1 |   286 |  1536   (1)| 00:00:19 |
|   1 |  NESTED LOOPS               |                      |     1 |   286 |  1536   (1)| 00:00:19 |
|   2 |   VIEW                      | VW_NSO_1             |   136K|  1596K|   910   (1)| 00:00:11 |
|   3 |    HASH UNIQUE              |                      |     1 |  3326K|            |          |
|*  4 |     VIEW                    |                      |   136K|  3326K|   910   (1)| 00:00:11 |
|*  5 |      WINDOW NOSORT          |                      |   136K|  2794K|   910   (1)| 00:00:11 |
|*  6 |       INDEX RANGE SCAN      | TABLE_REDACTED_INDEX |   136K|  2794K|   910   (1)| 00:00:11 |
|   7 |   TABLE ACCESS BY USER ROWID| TABLE_REDACTED       |     1 |   274 |     1   (0)| 00:00:01 |
----------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   4 - filter("RN"<=10)
   5 - filter(ROW_NUMBER() OVER ( PARTITION BY "CLIENT_ID","SCE_ID" ORDER BY NULL NULL
              )<=10)
   6 - access("TABLE_REDACTED"."SHORTURL"=:b1)

Looks like a nice solution, and it substantially outperforms my initial query. I've updated the question with performance measurements of the old query and a slightly-modified version of yours. I appreciate the help! — Trevor, Mar 19 '12 at 05:53

score 0 · Answer 2 · answered Feb 28 '12 at 16:04

0

It seems to be the sort taking all the time. Is occurrenceDate your clustered index, and if not, is it much quicker if you change to order by your clustered index? I.e. if it's clustered by a sequential id, then order by that.

answered Feb 28 '12 at 16:04

Tim Rogers

21,297
6
52
68

I disagree, the first key in the index should be the field that is being partitioned by `contextid`, then the order column `occurrenceDate` - this is the order in which the data needs to be accessed – J Cooper Feb 28 '12 at 16:16
@JCooper I'm not suggesting changing the indexes here, just changing the query. And if it's log data, you wouldn't want to cluster by `contextid` as the index needs to be chronological for fast INSERTs. – Tim Rogers Feb 28 '12 at 16:22
@TimRogers This table doesn't possess a clustered index (and I was previously unfamiliar with the term), though from what I'm reading, they might help. Converting this to be an index-organized-table on the occurrencedate column would accomplish much the same goal, correct? The current index I'm using is on (shorturl, contextid), which I'm open to changing. My concern is that due to the use of the row_number function Oracle might still not know to call it quits after selecting 10 per contextid regardless of data structure. I'll do some testing to see if this helps. – Trevor Feb 28 '12 at 19:26
@Trevor You definitely want a clustered index on either the insert date, or a sequential id. If you order a query by the clustered index, there is effectively no work to do (the data is already ordered physically on the disk using the clustered index). If you order by anything else, and there are millions of rows, there is a very large amount of work to do: the db can't fit it all in memory, so it has to sort the data in chunks, spool it to temporary tables on disk, then merge the results. – Tim Rogers Feb 29 '12 at 09:26

score 0 · Answer 3 · answered Feb 28 '12 at 16:44

0

Last time I simply cached last most interesting rows in a smaller table. with my data distribution it was cheaper to update the cache table on every insert rather than query the bulk table.

answered Feb 28 '12 at 16:44

b0rg

1,879
12
17

score 0 · Answer 4 · answered Feb 28 '12 at 19:15

I think you should also check other ways/queries to achieve the same result set.

Self-JOIN / GROUP BY

SELECT
    d.*
  , COUNT(*) AS rn

FROM 
        dailylogdata AS d 
    LEFT OUTER JOIN
        dailylogdata AS d2 
            ON  d.contextid = d2.contextid 
            AND d.occurrencedate >= d2.occurrencedate) 
            AND d2.shorturl IN (?)

WHERE
    d.shorturl IN (?)

GROUP BY 
    d.* 

HAVING 
    COUNT(*) <= 10

And another one which I have no idea if it works correctly:

SELECT
    d.*
  , COUNT(*) AS rn

FROM 
        ( SELECT DISTINCT
              contextid
          FROM 
              dailylogdata 
          WHERE
              shorturl IN (?)
        ) AS dd 
    JOIN
        dailylogdata AS d
            ON  d.PK IN 
                ( SELECT
                      d10.PK
                  FROM
                      dailylogdata AS d10  
                  WHERE
                      d10.contextid = dd.contextid 
                    AND
                      d10.shorturl IN (?)
                    AND
                      rownum <= 10
                  ORDER BY 
                      d10.occurrencedate
                )

That self-join is a really bad idea. It'll join each row to every row that follows it. 100 rows with the same `contextid` will result in 4900 rows in the result set. — Allan, Feb 28 '12 at 20:54
@Allan: I agree 100%. The self-join works better when there are many different contextids in the table (few rows per contextid). — ypercubeᵀᴹ, Feb 28 '12 at 20:56

High performance approach to greatest-n-per-group SQL query

4 Answers4

Linked