5

I am trying to get a dense rank to group sets of data together. In my table I have ID, GRP_SET, SUB_SET, and INTERVAL which simply represents a date field. When records are inserted using an ID they get inserted as GRP_SETs of 3 rows shown as a SUB_SET. As you can see when inserts happen the interval can change slightly before it finishes inserting the set.

Here is some example data and the DRANK column represents what ranking I'm trying to get.

with q as (
select 1 id, 'a' GRP_SET, 1 as SUB_SET, 123 as interval, 1 as DRANK from dual union all
select 1, 'a', 2, 123, 1 from dual union all
select 1, 'a', 3, 124, 1 from dual union all
select 1, 'b', 1, 234, 2 from dual union all
select 1, 'b', 2, 235, 2 from dual union all
select 1, 'b', 3, 235, 2 from dual union all
select 1, 'a', 1, 331, 3 from dual union all
select 1, 'a', 2, 331, 3 from dual union all
select 1, 'a', 3, 331, 3 from dual)

select * from q

Example Data

ID GRP_SET SUBSET INTERVAL DRANK
1  a       1      123      1
1  a       2      123      1
1  a       3      124      1
1  b       1      234      2
1  b       3      235      2
1  b       2      235      2
1  a       1      331      3
1  a       2      331      3
1  a       3      331      3

Here is the query I Have that gets close but I seem to need something like a:

  • Partition By: ID
  • Order within partition by: ID, Interval
  • Change Rank when: ID, GRP_SET (change)

select
   id, GRP_SET, SUB_SET, interval,
   DENSE_RANK() over (partition by ID order by id, GRP_SET) as DRANK_TEST
from q
Order by
   id, interval
David Faber
  • 12,277
  • 2
  • 29
  • 40

2 Answers2

2

This might work for you. The complicating factor is that you want the same "DENSE RANK" for intervals 123 and 124 and for intervals 234 and 235. So we'll truncate them to the nearest 10 for purposes of ordering the DENSE_RANK() function:

SELECT id, grp_set, sub_set, interval, drank
     , DENSE_RANK() OVER ( PARTITION BY id ORDER BY TRUNC(interval, -1), grp_set ) AS drank_test
  FROM q

Please see SQL Fiddle demo here.

If you want the intervals to be even closer together in order to be grouped together, then you can multiply the value before truncating. This would group them by 3s (but maybe you don't need them so granular):

SELECT id, grp_set, sub_set, interval, drank
     , DENSE_RANK() OVER ( PARTITION BY id ORDER BY TRUNC(interval*10/3, -1), grp_set ) AS drank_test
  FROM q
David Faber
  • 12,277
  • 2
  • 29
  • 40
2

Using the MODEL clause

Behold for you are pushing your requirements beyond the limits of what is easy to express in "ordinary" SQL. But luckily, you're using Oracle, which features the MODEL clause, a device whose mystery is only exceeded by its power (excellent whitepaper here). You shall write:

SELECT
   id, grp_set, sub_set, interval, drank
FROM (
  SELECT id, grp_set, sub_set, interval, 1 drank
  FROM q
)
MODEL PARTITION BY (id)
      DIMENSION BY (row_number() OVER (ORDER BY interval, sub_set) rn)
      MEASURES (grp_set, sub_set, interval, drank)
      RULES (
        drank[any] = NVL(drank[cv(rn) - 1] + 
                         DECODE(grp_set[cv(rn) - 1], grp_set[cv(rn)], 0, 1), 1)
      )

Proof on SQLFiddle

Explanation:

SELECT
   id, grp_set, sub_set, interval, drank
FROM (
  -- Here, we initialise your "dense rank" to 1
  SELECT id, grp_set, sub_set, interval, 1 drank
  FROM q
)

-- Then we partition the data set by ID (that's your requirement)
MODEL PARTITION BY (id)

-- We generate row numbers for all columns ordered by interval and sub_set,
-- such that we can then access row numbers in that particular order
      DIMENSION BY (row_number() OVER (ORDER BY interval, sub_set) rn)

-- These are the columns that we want to generate from the MODEL clause
      MEASURES (grp_set, sub_set, interval, drank)

-- And the rules are simple: Each "dense rank" value is equal to the
-- previous "dense rank" value + 1, if the grp_set value has changed
      RULES (
        drank[any] = NVL(drank[cv(rn) - 1] + 
                         DECODE(grp_set[cv(rn) - 1], grp_set[cv(rn)], 0, 1), 1)
      )

Of course, this only works if there are no interleaving events, i.e. there is no other grp_set than a between 123 and 124

Lukas Eder
  • 211,314
  • 129
  • 689
  • 1,509
  • This was the best solution for solving my problem as it didn't require any assumptions as to the time delay between the sets of data. Thanks allot for the help Looks like I have a few hours of reading to do on the MODEL feature in Oracle. – Cullen Burton Jan 21 '15 at 14:23
  • @CullenBurton: [This whitepaper is the best resource I've found so far](http://www.oracle.com/technetwork/middleware/bi-foundation/10gr1-twp-bi-dw-sqlmodel-131067.pdf). But note that my solution still made an assumption: The assumption that two `grp_set` will never have interleaving `interval` values. If that happens, you can still use `MODEL`, but it will be a bit more complex to write... – Lukas Eder Jan 21 '15 at 15:14
  • 1
    Found a more comprehensive document here with a TON of features: http://docs.oracle.com/cd/B19306_01/server.102/b14223/sqlmodel.htm – Cullen Burton Jan 21 '15 at 16:28
  • @CullenBurton: Perfect. That ought to be reading material for 1-2 days :) – Lukas Eder Jan 21 '15 at 16:30
  • 1
    @CullenBurton: And in case you're on 12c, don't forget to learn also about the new [`MATCH_RECOGNIZE`](http://docs.oracle.com/database/121/SQLRF/statements_10002.htm#SQLRF01702) clause. [Example here](http://docs.oracle.com/database/121/SQLRF/statements_10002.htm#SQLRF01702). It's an entirely different planet (and might be even more suitable to your problem - I haven't gotten around to learn it myself, so I can't use it in an answer yet) – Lukas Eder Jan 21 '15 at 16:32