Create group index to group by repeating values in a sorted data.table

Question

I want to perform some summary statistics over groups in time series. Here is what I have

test = data.table(ind=c(1:8), val = c(0,0,1,1,1,0,0,1))

    ind val
1:   1   0
2:   2   0
3:   3   1
4:   4   1
5:   5   1
6:   6   0
7:   7   0
8:   8   1

How do I create third column which will be my group index? like this -

    ind val group_id
1:   1   0        1
2:   2   0        1
3:   3   1        2
4:   4   1        2
5:   5   1        2
6:   6   0        3
7:   7   0        3
8:   8   1        4

I'm interested in efficient solution (as real life table is rather big) and would love to avoid straightforward for loop

@mt1022 would you mind posting that as a response? I've been working with data.table for years, but didn't know about this rather new 'convenience' function — RInatM, May 25 '17 at 09:01
@RInatM, sure. I initially thought this is rather simple and should be a dup. Many answers use `rleid`. But I haven't found a proper one. — mt1022, May 25 '17 at 10:04

score 1 · Accepted Answer · edited May 26 '17 at 19:48

1

data.table::rleid is specifically designed for such jobs. It generates run-length-type IDs as described in ?rleid.

In this case you can try:

test[, group_id := rleid(val)]

edited May 26 '17 at 19:48

MichaelChirico

33,841
14
113
198

answered May 25 '17 at 10:03

mt1022

16,834
5
48
71

Create group index to group by repeating values in a sorted data.table

1 Answers1