2

I have an Oracle 11.2 table that looks like the following.

MSG_ID JOB_ID STATUS   CREATION_DATE
1      101    SUB      03/30/2011 14:39:52
2      101    SUB      03/30/2011 14:44:37
3      101    RCVD     03/31/2011 14:41:49
4      101    ACCPTD   03/31/2011 14:59:51
5      101    DESIGN   03/31/2011 17:44:37
6      101    REVIEW   03/31/2011 17:44:50
7      101    (null)   04/01/2011 07:51:41
8      101    REJECTED 04/01/2011 07:56:56
9      101    (null)   04/01/2011 16:36:19
10     101    REVIEW   04/01/2011 16:43:49
11     101    APPROVED 04/04/2011 07:55:45
12     101    (null)   04/04/2011 08:14:23
13     101    REJECTED 04/04/2011 14:46:17
14     101    REVIEW   04/04/2011 14:50:50
15     101    (null)   04/04/2011 14:51:27
16     101    REVIEW   04/04/2011 14:57:50
17     101    (null)   04/04/2011 15:34:50
18     101    APPROVED 04/04/2011 15:41:21
19     101    (null)   04/04/2011 15:41:23
20     101    REJECTED 04/04/2011 15:41:24
21     101    REVIEW   04/04/2011 16:07:52
22     101    (null)   04/05/2011 08:50:35
23     101    APPROVED 04/05/2011 10:54:51
24     101    PROD     04/05/2011 11:08:50
25     101    COMPLET  04/06/2011 20:09:49

I am trying to group adjacent(with respect to creation date) message statuses and the time interval between each unique status combination. Null statuses can be ignored. The result would look like the following.

JOB_ID FROM_STATUS TO_STATUS TIME_INTERVAL
101    SUB         RCVD      sum of time intervals
101    RCVD        ACCPTD    sum of time intervals
101    ACCPTD      DESIGN    sum of time intervals
101    DESIGN      REVIEW    sum of time intervals
101    REVIEW      REJECTED  sum of time intervals
101    REJECTED    REVIEW    sum of time intervals
101    REVIEW      APPROVED  sum of time intervals
101    APPROVED    REJECTED  sum of time intervals
101    APPROVED    PROD      sum of time intervals
101    PROD        COMPLETE  sum of time intervals

Could someone help with the SQL needed to do this.

Thank you in advance

slugster
  • 49,403
  • 14
  • 95
  • 145
Joe
  • 193
  • 3
  • 4
  • 14

1 Answers1

3

This is a variation on a gaps-and-islands problem. The first step is to identify the contiguous groups, which you can do like this:

select msg_id, job_id, status, creation_date,
    row_number() over (partition by job_id order by creation_date)
        - row_number() over (partition by job_id, status
            order by creation_date) as chain
from t42
where status is not null

...which gives you a unique 'chain' number for each block with the same job_id and status value. You can then collapse those to get a single row for each 'chain', since you only seem to be interested in the earliest creation_date for each contiguous block of statuses (I think).

Then you can use lead to peek ahead at the next block and work out the time difference, and finally - if I've understood your final 'sum' requirement - group the intervals.

So altogether:

select job_id, from_status, to_status, sum(time_interval) as time_interval
from (
    select job_id, status as from_status, status_start,
        lead(status) over (partition by job_id order by status_start)
            as to_status,
        lead(status_start) over (partition by job_id order by status_start)
            - status_start as time_interval
    from (
        select distinct job_id, status, chain,
            min(creation_date) over (partition by job_id, status, chain
                order by chain) as status_start
        from (
            select msg_id, job_id, status, creation_date,
                row_number() over (partition by job_id order by creation_date)
                    - row_number() over (partition by job_id, status
                        order by creation_date) as chain
            from t42
            where status is not null
        )
    )
)
where to_status is not null
group by job_id, from_status, to_status
order by job_id, min(status_start);

With you data that gives:

    JOB_ID FROM_STA TO_STATU TIME_INTERVAL
---------- -------- -------- -------------
       101 SUB      RCVD        1.00135417
       101 RCVD     ACCPTD      .012523148
       101 ACCPTD   DESIGN      .114421296
       101 DESIGN   REVIEW      .000150463
       101 REVIEW   REJECTED    .591736111
       101 REJECTED REVIEW      .387430556
       101 REVIEW   APPROVED    3.45099537
       101 APPROVED REJECTED    .285127315
       101 APPROVED PROD        .009710648
       101 PROD     COMPLET     1.37568287

10 rows selected.

The time_interval is in days, but you can manipulate that however you want; since you mention intervals, you can use Oracle intervals:

select job_id, from_status, to_status,
    numtodsinterval(sum(time_interval) , 'DAY') as time_interval
from (
...

    JOB_ID FROM_STA TO_STATU TIME_INTERVAL
---------- -------- -------- -----------------------------
       101 SUB      RCVD     +000000001 00:01:57.000000000
       101 RCVD     ACCPTD   +000000000 00:18:02.000000000
       101 ACCPTD   DESIGN   +000000000 02:44:46.000000000
       101 DESIGN   REVIEW   +000000000 00:00:13.000000000
       101 REVIEW   REJECTED +000000000 14:12:06.000000000
       101 REJECTED REVIEW   +000000000 09:17:54.000000000
       101 REVIEW   APPROVED +000000003 10:49:26.000000000
       101 APPROVED REJECTED +000000000 06:50:35.000000000
       101 APPROVED PROD     +000000000 00:13:59.000000000
       101 PROD     COMPLET  +000000001 09:00:59.000000000

10 rows selected.
Community
  • 1
  • 1
Alex Poole
  • 183,384
  • 11
  • 179
  • 318
  • @Alex, very nice answer indeed. I am reivewing it, but so far seems exactly what I need. Thank you. – Joe Mar 27 '13 at 13:38
  • @Joe - glad to hear it helped. Please remember to upvote useful answers to your questions and [accept](http://meta.stackexchange.com/a/5235) those that solve your problem. – Alex Poole Apr 01 '13 at 16:24