3

I need to extract continous ranges from a table based on consecutive numbers (column N) and same "category" these numbers relate to (column C below). Graphically it looks like this:

 N  C  D
--------
 1  x  a           C  N1  N2  D1  D2
 2  x  b          ------------------
 3  x  c           x   1   4   a   d     (continuous range with same N)
 4  x  d    ==>    x   6   7   e   f     (new range because "5" is missing)
 6  x  e           y   8  10   g   h     (new range because C changed to "y")
 7  x  f
 8  y  g
 9  y  h
10  y  i

SQL Server is 2005. Thanks.

J Cooper
  • 4,828
  • 3
  • 36
  • 39
zzandy
  • 2,263
  • 1
  • 23
  • 43
  • Is it acceptable to do this through a stored procedure ? – berty Feb 09 '12 at 15:16
  • If you have access to the SQL Cookbook, this is recipe 10.3. http://www.amazon.com/Cookbook-Cookbooks-OReilly-Anthony-Molinaro/dp/0596009763/ref=sr_1_1?ie=UTF8&qid=1328800970&sr=8-1 It's quite involved. – Matt Fenwick Feb 09 '12 at 15:23
  • @MattFenwick: Thanks, looks like that recepie can do what I need, after I can chew it over. – zzandy Feb 09 '12 at 15:50

4 Answers4

4
DECLARE @myTable Table
(
    N INT,
    C CHAR(1),
    D CHAR(1)
)
INSERT INTO @myTable(N,C,D) VALUES(1,  'x', 'a');
INSERT INTO @myTable(N,C,D) VALUES(2,  'x', 'b');
INSERT INTO @myTable(N,C,D) VALUES(3,  'x', 'c');
INSERT INTO @myTable(N,C,D) VALUES(4,  'x', 'd');
INSERT INTO @myTable(N,C,D) VALUES(6,  'x', 'e');
INSERT INTO @myTable(N,C,D) VALUES(7,  'x', 'f');
INSERT INTO @myTable(N,C,D) VALUES(8,  'y', 'g');
INSERT INTO @myTable(N,C,D) VALUES(9,  'y', 'h');
INSERT INTO @myTable(N,C,D) VALUES(10, 'y', 'i');


WITH StartingPoints AS(

    SELECT A.*, ROW_NUMBER() OVER(ORDER BY A.N) AS rownum
    FROM @myTable AS A
    WHERE NOT EXISTS(
        SELECT *
        FROM @myTable B
        WHERE B.C = A.C
          AND B.N = A.N - 1
    )
 ),
 EndingPoints AS(
    SELECT A.*, ROW_NUMBER() OVER(ORDER BY A.N) AS rownum
    FROM @myTable AS A
    WHERE NOT EXISTS (
        SELECT *
        FROM @myTable B
        WHERE B.C = A.C
          AND B.N = A.N + 1
    )
 ) 
SELECT StartingPoints.C,
       StartingPoints.N AS [N1],
       EndingPoints.N AS [N2],
       StartingPoints.D AS [D1],
       EndingPoints.D AS [D2] 
FROM StartingPoints
JOIN EndingPoints ON StartingPoints.rownum = EndingPoints.rownum

Results:

C    N1          N2          D1   D2
---- ----------- ----------- ---- ----
x    1           4           a    d
x    6           7           e    f
y    8           10          g    i
J Cooper
  • 4,828
  • 3
  • 36
  • 39
1

The RANK function is a safer bet than ROW_NUMBER, in case any N values are duplicated, as in the following example:

declare @ncd table(N int, C char, D char);

insert into @ncd
select 1,'x','a' union all
select 2,'x','b' union all
select 3,'x','c' union all
select 4,'x','d' union all
select 4,'x','e' union all
select 7,'x','f' union all
select 8,'y','g' union all
select 9,'y','h' union all
select 10,'y','i' union all
select 10,'y','j';

with a as (
    select *
    , r = N-rank()over(partition by C order by N)
    from @ncd
)
select C=MIN(C)
, N1=MIN(N)
, N2=MAX(N)
, D1=MIN(D)
, D2=MAX(D)
from a
group by r;

Result, which correctly withstands the duplicated 4 and 10:

C    N1          N2          D1   D2
---- ----------- ----------- ---- ----
x    1           4           a    e
x    7           7           f    f
y    8           10          g    j
John Dewey
  • 6,985
  • 3
  • 22
  • 26
1

Using this answer as a starting point, I ended up with the following:

;
WITH data (N, C, D) AS (
  SELECT 1,  'x', 'a' UNION ALL
  SELECT 2,  'x', 'b' UNION ALL
  SELECT 3,  'x', 'c' UNION ALL
  SELECT 4,  'x', 'd' UNION ALL
  SELECT 6,  'x', 'e' UNION ALL
  SELECT 7,  'x', 'f' UNION ALL
  SELECT 8,  'y', 'g' UNION ALL
  SELECT 9,  'y', 'h' UNION ALL
  SELECT 10, 'y', 'i'
),
ranked AS (
  SELECT
    curr.*,
    Grp     = curr.N - ROW_NUMBER() OVER (PARTITION BY curr.C ORDER BY curr.N),
    IsStart = CASE WHEN pred.C IS NULL THEN 1 ELSE 0 END,
    IsEnd   = CASE WHEN succ.C IS NULL THEN 1 ELSE 0 END
  FROM data AS curr
    LEFT JOIN data AS pred ON curr.C = pred.C AND curr.N = pred.N + 1
    LEFT JOIN data AS succ ON curr.C = succ.C AND curr.N = succ.N - 1
)
SELECT
  C,
  N1 = MIN(N),
  N2 = MAX(N),
  D1 = MAX(CASE IsStart WHEN 1 THEN D END),
  D2 = MAX(CASE IsEnd   WHEN 1 THEN D END)
FROM ranked
WHERE 1 IN (IsStart, IsEnd)
GROUP BY C, Grp
Community
  • 1
  • 1
Andriy M
  • 76,112
  • 17
  • 94
  • 154
0

Write a stored procedure. It will create and fill a temporary table witch will contain C, N1, N2, D1 and D2 columns.

  • Create the temporary table
  • use a cursor to loop on entries in table containing N, C, D ordered by N
  • use a variable to detect a new range (Ni < N(i-1)-1) and to store N1, N2, D1 and D2
  • INSERT into the temporary table for each range detected (new range detected or and of the cursor)

Tell me if you need a code example.

berty
  • 2,178
  • 11
  • 19
  • Thank you, I'll work out the code. I was hoping for non-cursor solution. The source table is potentially millions of rows long. – zzandy Feb 09 '12 at 15:27