53

I have below table

create table #t (Id int, Name char)

insert into #t values
(1, 'A'),
(2, 'A'),
(3, 'B'),
(4, 'B'),
(5, 'B'),
(6, 'B'),
(7, 'C'),
(8, 'B'),
(9, 'B')

I want to count consecutive values in name column

+------+------------+
| Name | Repetition |
+------+------------+
| A    |          2 |
| B    |          4 |
| C    |          1 |
| B    |          2 |
+------+------------+

The best thing I tried is:

select Name
, COUNT(*) over (partition by Name order by Id) AS Repetition
from #t
order by Id

but it doesn't give me expected result

Gordon Linoff
  • 1,242,037
  • 58
  • 646
  • 786
FLICKER
  • 6,439
  • 4
  • 45
  • 75

3 Answers3

57

One approach is the difference of row numbers:

select name, count(*) 
from (select t.*,
             (row_number() over (order by id) -
              row_number() over (partition by name order by id)
             ) as grp
      from t
     ) t
group by grp, name;

The logic is easiest to understand if you run the subquery and look at the values of each row number separately and then look at the difference.

Gordon Linoff
  • 1,242,037
  • 58
  • 646
  • 786
  • 2
    With `order by max(id)` added, the result would look exactly like in the OP's post (involving the order of the records). – King King Apr 29 '16 at 07:11
  • 1
    @KingKing, thanks for your hint. Yes, it makes it exactly like what I need. – FLICKER Apr 29 '16 at 16:41
  • Why my query doesn't give me the result, I expect when I partition by Name, the count(*) should reset when the values of Name changes. I appreciate if you explain that. Thanks again. – FLICKER Apr 29 '16 at 16:43
  • @FLICKER . . . I think if you run your query, you should understand. It enumerates the names regardless of gaps between rows. That is, it ignores gaps. – Gordon Linoff May 01 '16 at 02:07
  • This solution doesn't work for this example: A, A, B, B, B, A, B – Ahmadreza Sep 13 '20 at 07:24
  • @AhmadRezaSaboor . . . It certainly should. If you have a question, though, you should ask it as a new question. – Gordon Linoff Sep 13 '20 at 14:58
  • 1
    @GordonLinoff I don't think a new question is required. Just calculate the grp values for my example and you'll understand where the issue is. The grp values are: 0, 0, 2, 2, 2, 3, 3. – Ahmadreza Sep 14 '20 at 12:29
  • @AhmadRezaSaboor Would wrapping the query in another one and then partitioning by id and grp not fix this issue? Something like select *,count(1) over (partition by id,grp) as correct_consec from ... – Alex M Apr 22 '21 at 20:38
  • This is the best solution. How can we do this using self join? Someone asked me this question in an interview. – vijayraj34 Jan 16 '22 at 09:01
  • 1
    @vijayraj34 i don't think self-join would give you this particular result (consecutive occurrences of any value). Self-join CAN give you the extremity (max/min, first/last, etc.) record of each record group: https://stackoverflow.com/a/8749095/258598 – cautionbug Feb 08 '22 at 14:31
  • @AhmadReza I was confused at some point as well. Your group values calculations were correct, but note that the final, outermost query would group by both group value and name. So it would work. – dz902 May 27 '22 at 15:34
  • This doesn't work for all cases. Fails for this: insert into aberration_detection_system.temp_x values (1, '0'), (2, '3'), (3, '2'), (4, '2'), (5, '2'), (6, '2'), (7, '2'), (8, '1'), (9, '0') ; – Satish Srinivas Nov 20 '22 at 16:15
16

You could use windowed functions like LAG and running total:

WITH cte AS (
 SELECT Id, Name, grp = SUM(CASE WHEN Name = prev THEN 0 ELSE 1 END) OVER(ORDER BY id)
 FROM (SELECT *, prev = LAG(Name) OVER(ORDER BY id) FROM t) s
)
SELECT name, cnt = COUNT(*)
FROM cte
GROUP BY grp,name
ORDER BY grp;

db<>fiddle demo

The first cte returns group number:

+-----+-------+-----+
| Id  | Name  | grp |
+-----+-------+-----+
|  1  | A     |   1 |
|  2  | A     |   1 |
|  3  | B     |   2 |
|  4  | B     |   2 |
|  5  | B     |   2 |
|  6  | B     |   2 |
|  7  | C     |   3 |
|  8  | B     |   4 |
|  9  | B     |   4 |
+-----+-------+-----+

And main query groups it based on grp column calculated earlier:

+-------+-----+
| name  | cnt |
+-------+-----+
| A     |   2 |
| B     |   4 |
| C     |   1 |
| B     |   2 |
+-------+-----+
Lukasz Szozda
  • 162,964
  • 23
  • 234
  • 275
  • 1
    This solution works perfectly fine, while the accepted answer fails for some cases. Thanks. @FLICKER You should select this as accepted answer instead. – Satish Srinivas Nov 20 '22 at 16:39
3

I have use Recursive CTE and minimise the use of row_number,also avoid count(*).

I think it will perform better,but in real world it depend what else filter you put to minimise number of rows affected.

If ID is having discreet values then One extra CTE will be use to generate continuous id.

 ;With CTE2 as
(
select ROW_NUMBER()over(order by id) id, name,1 Repetition ,1 Marker  from @t
)
, CTE as
(
select top 1 cast(id as int) id, name,1 Repetition ,1 Marker  from CTE2 order by id

union all

select a.id, a.name
, case when a.name=c.name then Repetition +1 else 1 end  
, case when a.name=c.name then c.Marker else  Marker+1 end
from @t a
inner join CTE c on a.id=c.id+1

)
,CTE1 as
(select *,ROW_NUMBER()over(partition by marker order by id desc)rn from cte c
)
select Name,Repetition from cte1 where rn=1
KumarHarsh
  • 5,046
  • 1
  • 18
  • 22