Group random sequences in R

Question

I have the following sequence in df (dput below):

The difference between values in a sequence is always +1 with the previous value. So that means the desired output should look like this:

   value group
1     -2     1
2     -1     1
3      0     1
4      1     1
5      2     1
6     -3     2
7     -2     2
8     -1     2
9      0     2
10     1     2
11    -1     3
12     0     3
13     1     3
14   -10     4
15    -9     4
16    -8     4
17    -7     4

As you can see the first sequence is -2,-1,0,1,2 and then next value is -3 which starts with a new sequence. I tried the following code:

library(dplyr)
df %>% 
  group_by(grp = cumsum(coalesce(value == -lag(value, n = 1), TRUE)))
#> # A tibble: 17 × 2
#> # Groups:   grp [2]
#>    value   grp
#>    <dbl> <int>
#>  1    -2     1
#>  2    -1     1
#>  3     0     1
#>  4     1     1
#>  5     2     1
#>  6    -3     1
#>  7    -2     1
#>  8    -1     1
#>  9     0     1
#> 10     1     1
#> 11    -1     2
#> 12     0     2
#> 13     1     2
#> 14   -10     2
#> 15    -9     2
#> 16    -8     2
#> 17    -7     2

^{Created on 2023-01-23 with reprex v2.0.2}

Which doesn't work because of the random shifts between sequences. So I was wondering if anyone knows how to group these random sequences?

dput of df:

df<-structure(list(value = c(-2, -1, 0, 1, 2, -3, -2, -1, 0, 1, -1, 
0, 1, -10, -9, -8, -7)), class = "data.frame", row.names = c(NA, 
-17L))

Try `cumsum(c(TRUE, diff(df$value) != 1))` – Sotos Jan 23 '23 at 10:37 — Sotos, Jan 23 '23 at 10:37

Maël · Accepted Answer · 2023-01-23T10:45:09.037

2

Edit: no need for abs if the sequence is always in the same direction.

You want to look for values with an absolute difference different from 1:

library(dplyr)
df %>% 
  group_by(grp = cumsum(c(TRUE, abs(diff(df$value)) != 1)))

Or with lag:

df %>% 
  group_by(grp = cumsum(abs((value - lag(value, default = TRUE))) != 1))

edited Jan 23 '23 at 10:45

answered Jan 23 '23 at 10:39

Maël

45,206
3
29
67

Group random sequences in R

1 Answers1