0

I have the following sequence in df (dput below):

> df
   value
1     -2
2     -1
3      0
4      1
5      2
6     -3
7     -2
8     -1
9      0
10     1
11    -1
12     0
13     1
14   -10
15    -9
16    -8
17    -7

The difference between values in a sequence is always +1 with the previous value. So that means the desired output should look like this:

   value group
1     -2     1
2     -1     1
3      0     1
4      1     1
5      2     1
6     -3     2
7     -2     2
8     -1     2
9      0     2
10     1     2
11    -1     3
12     0     3
13     1     3
14   -10     4
15    -9     4
16    -8     4
17    -7     4 

As you can see the first sequence is -2,-1,0,1,2 and then next value is -3 which starts with a new sequence. I tried the following code:

library(dplyr)
df %>% 
  group_by(grp = cumsum(coalesce(value == -lag(value, n = 1), TRUE)))
#> # A tibble: 17 × 2
#> # Groups:   grp [2]
#>    value   grp
#>    <dbl> <int>
#>  1    -2     1
#>  2    -1     1
#>  3     0     1
#>  4     1     1
#>  5     2     1
#>  6    -3     1
#>  7    -2     1
#>  8    -1     1
#>  9     0     1
#> 10     1     1
#> 11    -1     2
#> 12     0     2
#> 13     1     2
#> 14   -10     2
#> 15    -9     2
#> 16    -8     2
#> 17    -7     2

Created on 2023-01-23 with reprex v2.0.2

Which doesn't work because of the random shifts between sequences. So I was wondering if anyone knows how to group these random sequences?


dput of df:

df<-structure(list(value = c(-2, -1, 0, 1, 2, -3, -2, -1, 0, 1, -1, 
0, 1, -10, -9, -8, -7)), class = "data.frame", row.names = c(NA, 
-17L))
Quinten
  • 35,235
  • 5
  • 20
  • 53

1 Answers1

2

Edit: no need for abs if the sequence is always in the same direction.


You want to look for values with an absolute difference different from 1:

library(dplyr)
df %>% 
  group_by(grp = cumsum(c(TRUE, abs(diff(df$value)) != 1)))

Or with lag:

df %>% 
  group_by(grp = cumsum(abs((value - lag(value, default = TRUE))) != 1))
Maël
  • 45,206
  • 3
  • 29
  • 67