1

I am trying a clear way to identify groups based on condition on multiple column.

As an example imagine we have a table of points ordered by date and another numeric column:

DT <- data.table(a = paste0("Date", 1:10), V1 = c(1, 2, 11, 9, 11, 11, 2, 2, 11, 11))

         a V1
 1:  Date1  1
 2:  Date2  2
 3:  Date3 11
 4:  Date4  9
 5:  Date5 11
 6:  Date6 11
 7:  Date7  2
 8:  Date8  2
 9:  Date9 11
10: Date10 11

Every time a value in V1 > 10 we have a new group obtaining the following table:

         a V1 ID
 1:  Date1  1  1
 2:  Date2  2  1
 3:  Date3 11  2
 4:  Date4  9  2
 5:  Date5 11  3
 6:  Date6 11  4
 7:  Date7  2  4
 8:  Date8  2  4
 9:  Date9 11  5
10: Date10 11  6

The tables are over 3million point so no loops please. Any help? Thank you very much!!!

Gerald T
  • 704
  • 3
  • 18
  • @42 I knew this question was somewhere on this site, but I had really hard times finding the right keyword, `cumulative` – Gerald T Nov 26 '16 at 12:25

1 Answers1

1

We need cumsum on the logical vector (V1 > 10), add 1 to it in case it starts from 0 and assign (:=) to create the 'ID' column

DT[, ID := cumsum(V1>10)+1]
DT$ID
#[1] 1 1 2 2 3 4 4 4 5 6
akrun
  • 874,273
  • 37
  • 540
  • 662