0

I want to perform some summary statistics over groups in time series. Here is what I have

test = data.table(ind=c(1:8), val = c(0,0,1,1,1,0,0,1))

    ind val
1:   1   0
2:   2   0
3:   3   1
4:   4   1
5:   5   1
6:   6   0
7:   7   0
8:   8   1

How do I create third column which will be my group index? like this -

    ind val group_id
1:   1   0        1
2:   2   0        1
3:   3   1        2
4:   4   1        2
5:   5   1        2
6:   6   0        3
7:   7   0        3
8:   8   1        4

I'm interested in efficient solution (as real life table is rather big) and would love to avoid straightforward for loop

RInatM
  • 1,208
  • 1
  • 17
  • 39
  • 4
    Try `data.table::rleid`: `test[, group_id := rleid(val)]`. – mt1022 May 25 '17 at 08:30
  • @mt1022 would you mind posting that as a response? I've been working with data.table for years, but didn't know about this rather new 'convenience' function – RInatM May 25 '17 at 09:01
  • @RInatM, sure. I initially thought this is rather simple and should be a dup. Many answers use `rleid`. But I haven't found a proper one. – mt1022 May 25 '17 at 10:04

1 Answers1

1

data.table::rleid is specifically designed for such jobs. It generates run-length-type IDs as described in ?rleid.

In this case you can try:

test[, group_id := rleid(val)]
MichaelChirico
  • 33,841
  • 14
  • 113
  • 198
mt1022
  • 16,834
  • 5
  • 48
  • 71