How to group by continuous records in SQL

Question

My table has these records

ID  Colour
------------
 1   Red
 2   Red
 3   Red
 4   Red
 5   Red
 6   Green
 7   Green
 8   Green
 9   Green
10   Red
11   Red
12   Red
13   Red
14   Green
15   Green
16   Green
17   Blue
18   Blue
19   Red
20   Blue

I can group by colour easily like this

SELECT Colour, MIN(ID) AS iMin, MAX(ID) AS iMax
FROM MyTable
GROUP BY Colour

This would return this result

Colour     iMin     iMax
-------------------------
Red        1        19
Green      6        16
Blue       17       20

But this is not what I want as Red does not go all the way from 1 to 19, Green breaks the sequence.

The result should be like this

Colour     iMin     iMax
------------------------
Red        1        5
Green      6        9
Red        10       13
Green      14       16
Blue       17       18
Red        19       19
Blue       20       20

I managed to do this by cursor, but wonder if there is a more efficient way to do that

score 4 · Accepted Answer · answered May 28 '20 at 01:04

This is a gaps-and-islands problem. Assuming that id is continously incrementing, you can use the difference between row_number() to define groups of "adjacent" records having the same colour:

select 
    colour, 
    min(id) iMin,
    max(id) iMax
from (
    select t.*, row_number() over(partition by colour order by id) rn
    from mytable t
) t
group by colour, id - rn
order by min(id)

Demo on DB Fiddle:

colour | iMin | iMax
:----- | ---: | ---:
Red    |    1 |    5
Green  |    6 |    9
Red    |   10 |   13
Green  |   14 |   16
Blue   |   17 |   18
Red    |   19 |   19
Blue   |   20 |   20

score 3 · Answer 2 · answered May 28 '20 at 01:04

This is a gap and islands problem. You can solve this with the difference of row numbers:

select colour, min(id), max(id)
from (select t.*,
             row_number() over (order by id) as seqnum,
             row_number() over (partition by colour order by id) as seqnum_c
      from t
     ) t
group by colour, (seqnum - seqnum_c);

Here is a db<>fiddle.

It is a little hard to explain how this works. However, if you look at the results of the subquery, you will see how the difference of row numbers identifies the adjacent colours.

score 1 · Answer 3 · answered May 28 '20 at 03:03

The query is valid whether or not the id column is an integer and the value of the id column is continuous.

;with c0 as(
select id, color,
       ROW_NUMBER() over(order by id)*
       (case when color <> LAG(color, 1, '') over(order by id) then 1 else 0 end) as color_id
from #temp
), c1 as(
select id, color, color_id, SUM(color_id) over(order by id) as color_gid
from c0
)
select color, MIN(id) as idMin, MAX(id) as idMax
from c1
group by color, color_gid

It can be extended to sort by column a, group by consecutive values of column b, and find aggregate values for column c, like this:

;with c0 as(
select C, B,
       ROW_NUMBER() over(order by A)*
       (case when B <> LAG(B, 1, '') over(order by A) then 1 else 0 end) as B_id
from TableName
), c1 as(
select C, B, B_id, SUM(B_id) over(order by A) as B_gid
from c0
)
select B, MIN(C) as CMin, MAX(C) as CMax
from c1
group by B, B_gid

How to group by continuous records in SQL

3 Answers3

Linked