Increment by 1 for every change in column

Question

Lets say I have the following data frame

set.seed(123)
df <- data.frame(var1=(runif(10)>0.5)*1)

var1 could have any type / number of levels not specifically 0 and 1s

I would like to create a var2 which increments by 1 every time var1 changes without using a for loop

Expected result in this case is:

data.frame(var1=(runif(10)>0.5)*1, var2=c(1, 2, 3, 4, 4, 5, 6, 6, 6, 7))

var1 var2
   0    1
   1    2
   0    3
   1    4
   1    4
   0    5
   1    6
   1    6
   1    6
   0    7

Another option for the data frame could be:

df <- data.frame(var1=c("a", "a", "1", "0", "b", "b", "b", "c", "1", "1"))

in this case the result should be:

cmbarbu · Accepted Answer · 2015-04-15T21:45:24.453

17

Building on Mr Flick answer:

df$var2 <- cumsum(c(0,as.numeric(diff(df$var1))!=0))

But if you don't want to use diff you can still use:

df$var2 <- c(0,cumsum(as.numeric(with(df,var1[1:(length(var1)-1)] != var1[2:length(var1)]))))

It starts at 0, not at 1 but I'm sure you see how to change it if you want to.

edited Apr 15 '15 at 21:45

answered Apr 15 '15 at 21:37

cmbarbu

4,354
25
45

score 13 · Answer 2 · answered Apr 15 '15 at 21:31

13

How about using diff() and cumsum(). For example

df$var2 <- cumsum(c(1,diff(df$var1)!=0))

answered Apr 15 '15 at 21:31

MrFlick

195,160
17
277
295

The levels of `var1` could be anything not just 0 and or 1. Like `c("a", "a", "1", "0", "b", "b", "a", ....)` – dimitris_ps Apr 15 '15 at 21:33
I get the `Warning message: In is.na(r) : is.na() applied to non-(list or vector) of type 'NULL'` – dimitris_ps Apr 15 '15 at 21:40
1

you need to use `as.numeric(diff(df$var1))==0` and not diff() alone – cmbarbu Apr 15 '15 at 21:44

Martin Morgan · Answer 3 · 2015-04-15T22:20:54.753

These look like a run-length encoding (rle)

x = c("a", "a", "1", "0", "b", "b", "b", "c", "1", "1")
r = rle(x)

with

> rle(x)
Run Length Encoding
  lengths: int [1:6] 2 1 1 3 1 2
  values : chr [1:6] "a" "1" "0" "b" "c" "1"

This says that the first value ("a") occurred 2 times in a row, then "1" occurred once, etc. What you're after is to create a sequence along the 'lengths', and replicate each element of sequence by the number of times the element occurs, so

> rep(seq_along(r$lengths), r$lengths)
 [1] 1 1 2 3 4 4 4 5 6 6

The other answers are semi-deceptive, since they rely on the column being a factor(); they fail when the column is actually a character().

> diff(x)
Error in r[i1] - r[-length(r):-(length(r) - lag + 1L)] : 
  non-numeric argument to binary operator

A work-around would be to map the characters to integers, along the lines of

> diff(match(x, x))
[1]  0  2  1  1  0  0  3 -5  0

Hmm, but having said that I find that rle's don't work on factors!

> f = factor(x)
> rle(f)
Error in rle(factor(x)) : 'x' must be a vector of an atomic type
> rle(as.vector(f))
Run Length Encoding
  lengths: int [1:6] 2 1 1 3 1 2
  values : chr [1:6] "a" "1" "0" "b" "c" "1"

score 6 · Answer 4 · answered Mar 24 '20 at 15:45

I am only copying Martin Morgan's rle() answer above, but implementing it using tidyverse conventions in order to add the grouping column directly to a dataframe/tibble, which is how I end up using is most of the time.

## Using run-length-encoding, create groups of identical values and put that
## common grouping identifier into a `grp` column.
library(tidyverse)

set.seed(42)

df <- tibble(x = sample(c(0,1), size=20, replace=TRUE, prob = c(0.2, 0.8)))

df %>%
    mutate(grp = rle(x)$lengths %>% {rep(seq(length(.)), .)})
#> # A tibble: 20 x 2
#>        x   grp
#>    <dbl> <int>
#>  1     0     1
#>  2     0     1
#>  3     1     2
#>  4     0     3
#>  5     1     4
#>  6     1     4
#>  7     1     4
#>  8     1     4
#>  9     1     4
#> 10     1     4
#> 11     1     4
#> 12     1     4
#> 13     0     5
#> 14     1     6
#> 15     1     6
#> 16     0     7
#> 17     0     7
#> 18     1     8
#> 19     1     8
#> 20     1     8

score 5 · Answer 5 · answered Apr 30 '17 at 11:27

Here is another solution with base R using inverse.rle():

df <- data.frame(var1=c("a", "a", "1", "0", "b", "b", "b", "c", "1", "1"))
r <- rle(as.character(df$var1))
r$values <- seq_along(r$values)
df$var2 <- inverse.rle(r)

Short version:

df$var2 <- with(rle(as.character(df$var1)), rep(seq_along(values), lengths))

Here is a solution with data.table:

library("data.table")
dt <- data.table(var1=c("a", "a", "1", "0", "b", "b", "b", "c", "1", "1"))
dt[, var2:=rleid(var1)]

score 0 · Answer 6 · answered Oct 21 '22 at 05:41

0

Using dplyr::lag

library(dplyr)
df <- df %>% mutate(var2 = cumsum(row_number() == 1 | (var1 != dplyr::lag(var1))))

answered Oct 21 '22 at 05:41

moreQthanA

43
9

Increment by 1 for every change in column

6 Answers6

Linked

Related