I have a data-set with a timestamp column and multiple identifier columns. I want to condense it to a single row for each "block" of adjacent rows with equal identifiers, when ordered by the timestamp. The min and max timestamp for each block is required.
Source Data:
TSTAMP ID1 ID2
t1 A B <= start of new block
t2 A B
t3 C D <= start of new block
t4 E F <= start of new block
t5 E F
t6 E F
t7 A B <= start of new block
t8 G H <= start of new block
Desired Result:
MIN_TSTAMP MAX_TSTAMP ID1 ID2
t1 t2 A B
t3 t3 C D
t4 t6 E F
t7 t7 A B
t8 t8 G H
I thought this was ripe for a window-ing analytic function but I cannot partition without grouping ALL equal combinations of IDn
- rather than only those in adjacent rows, when ordered by timestamp.
A workaround is to create a key column first in an in-line view that I can later group by i.e. with same value for each row in the block and different value for each block. I can do this using LAG analytic function to compare row values and then calling a PL/SQL function to return nextval/currval values of a sequence (calling nextval/currval directly in the SQL is restricted in this context).
select min(ilv.tstamp), max(ilv.tstamp), id1, id2
from (
select case when (id1 != lag(id1,1,'*') over (partition by (1) order by tstamp)
or id2 != lag(id2,1,'*') over (partition by (1) order by tstamp))
then
pk_seq_utils.gav_get_nextval
else
pk_seq_utils.gav_get_currval
end ident, t.*
from tab1 t
order by tstamp) ilv
group by ident, id1, id2
order by 1;
where the gav_get_xxx
functions simply return currval/nextval from a sequence.
But I would like to use SQL only and avoid PL/SQL (as I could also write this easily in PL/SQL and pipe out the result-rows from a pipeline function).
Any ideas?
Thanks.