Create a sequence of numbers that increments for every change in another variable

Question

What is an efficient way to create a sequence of numbers that increments for each change in a group variable? As a toy example, using the data frame below, I would like a new variable, "Value", to take on the values c(1,1,1,2,2,3,3,4). Note that even though 48 repeats itself, "Value" still increases as I'm only concerned with a change in the sequence.

df <- read.table(textConnection(
  'Group 
  48 
  48
  48
  56
  56
  48
  48
  14'), header = TRUE)

One way to do this is

df$Value<-1
for(i in 2:nrow(df)){
if(df[i,]$Group==df[i-1,]$Group){df[i,]$Value=df[i-1,]$Value}
else{df[i,]$Value=df[i-1,]$Value+1}
}

but this is very slow. My actual dataset has several million observations.

Note: I had a difficult time wording the title of this question so please change it if you'd like.

The tidyverse team [credit inspiration from](https://www.tidyverse.org/blog/2023/02/dplyr-1-1-0-vctrs/#consecutive_id) `data.table::rleid()` for their equivalent, `dplyr::consecutive_id()` — Silverfish, Mar 17 '23 at 03:28

jay.sf · Accepted Answer · 2022-01-05T07:15:17.463

5

We also could hack the rle.

r <- rle(df$Group)
r$values <- seq_along(r$lengths)
inverse.rle(r)
# [1] 1 1 1 2 2 3 3 4

Data

df <- structure(list(Group = c(48L, 48L, 48L, 56L, 56L, 48L, 48L, 14L
)), class = "data.frame", row.names = c(NA, -8L))

edited Jan 05 '22 at 07:15

answered Jun 24 '19 at 17:05

jay.sf

60,139
8
53
110

mirirai · Answer 2 · 2022-07-25T18:42:18.317

Inspired by this post: https://stackoverflow.com/a/44512144/3772141

Just do this:

library(dplyr)

df %>%
  mutate(Value = cumsum(Group != lag(Group) | row_number() == 1))

Result:

# Group Value
#    48     1
#    48     1
#    48     1
#    56     2
#    56     2
#    48     3
#    48     3
#    14     4

How it works:

compare Value with Value from the previous row. If it changes set it to TRUE, indicating here begins an new Value: Group != lag(Group)
The first element the lag function returns is NA. But it should always to be TRUE for the first row: | row_number() == 1
TRUE and FALSE can be represented as 1 and 0. So with the cumsum function Value gets incremented whenever the inner expression returns TRUE which is when Group changes.

score 0 · Answer 3 · answered Jun 28 '19 at 13:24

How about

library(tidyverse)
df = data.frame(Group = c(48, 
                      48,
                      48,
                      56,
                      56,
                      48,
                      48,
                      14))

# Get unique values in group
unique_vals = unique(df$Group)

# create a sequence from 1 up until the length of the unique values vector
sequential_nums = 1:length(unique_vals)

# Create a new column looking up the current value in the unique_vals list
# and replacing it with the correct sequential number
df %>% 
  mutate(Value = sequential_nums[match(Group, unique_vals)])

# Group      Value 
# 1    48         1
# 2    48         1
# 3    48         1
# 4    56         2
# 5    56         2
# 6    48         1
# 7    48         1
# 8    14         3

score 0 · Answer 4 · answered Mar 17 '23 at 03:25

If you're in the tidyverse, dplyr 1.1.0 has the function consecutive_id() which does exactly what you want! The tidyverse team recommend it for use on Zoom call transcripts, where consecutive lines attributed to the same speaker should be grouped together as a single idea: https://www.tidyverse.org/blog/2023/02/dplyr-1-1-0-vctrs/#consecutive_id

library(dplyr)
df <- read.table(textConnection(
  'Group 
  48 
  48
  48
  56
  56
  48
  48
  14'), header = TRUE)

df |> mutate(value = consecutive_id(Group))
#>   Group value
#> 1    48     1
#> 2    48     1
#> 3    48     1
#> 4    56     2
#> 5    56     2
#> 6    48     3
#> 7    48     3
#> 8    14     4

Create a sequence of numbers that increments for every change in another variable

4 Answers4