3

I am working with data structured somewhat like the following:

testdat <- data.frame(MyDat = c("a","a","b","b","b","a","a","a","a","b"))

I would like to create a counter variable MyCount that iterates along MyDat, breaking and adding 1 to the count whenever there is a shift in the factor level of 'MyDat. Ideally, the result would look something like this:

MyCount MyDat
1 a
1 a
2 b
2 b
2 b
3 a
3 a
3 a
3 a
4 b

I am struggling with trying to figure out how to set this loop up for checking whether one row value is equivalent to the previous row and if not then breaking and adding one to the counter. It also appears I need to start iterating only on the second row onward. Something like:

testdat <- data.frame(MyDat = c("a","a","b","b","b","a","a","a","a","b"))

v <- vector(mode = "integer", length = length(testdat))
counter <- 1
for(i in v) {
  if(testdat[, MyDat] == testdat[i-1, MyDat]) {
    counter
  } else {
    counter = counter + 1
}

both <- cbind(v, testdat)
M--
  • 25,431
  • 8
  • 61
  • 93
Jamie_R
  • 87
  • 7
  • Check [here](https://stackoverflow.com/questions/6112803/how-to-create-a-consecutive-group-number/74129105#74129105) for a complete set of answers – Maël Mar 06 '23 at 17:36
  • @Maël while there are answers there that would address this question, they are not exactly the same questions. However, I guess that's why you didn't use your Mjolnir. Cheers. – M-- Mar 06 '23 at 19:11

3 Answers3

3

Use consecutive_id

library(dplyr)
testdat %>%
  mutate(MyCount = consecutive_id(MyDat), .before = 1)

-output

    MyCount MyDat
1        1     a
2        1     a
3        2     b
4        2     b
5        2     b
6        3     a
7        3     a
8        3     a
9        3     a
10       4     b

Or in base R with rle

with(rle(testdat$MyDat), rep(seq_along(values), lengths))
 [1] 1 1 2 2 2 3 3 3 3 4
akrun
  • 874,273
  • 37
  • 540
  • 662
3

Two base options:

# (1)
cumsum(c(1, tail(testdat$MyDat, -1) != head(testdat$MyDat, -1)))

# [1] 1 1 2 2 2 3 3 3 3 4
# (2)
cumsum(c(1, diff(as.integer(factor(testdat$MyDat))) != 0))

# [1] 1 1 2 2 2 3 3 3 3 4
Darren Tsai
  • 32,117
  • 5
  • 21
  • 51
2

This is what data.table::rleid does:

library(data.table)

setDT(testdat)[ , MyCount := rleid(MyDat)]
#>     MyDat MyCount
#>  1:     a       1
#>  2:     a       1
#>  3:     b       2
#>  4:     b       2
#>  5:     b       2
#>  6:     a       3
#>  7:     a       3
#>  8:     a       3
#>  9:     a       3
#> 10:     b       4
M--
  • 25,431
  • 8
  • 61
  • 93