12

The following vector x contains the two sequences 1:4 and 6:7, among other non-sequential digits.

x <- c(7, 1:4, 6:7, 9)

I'd like to split x by its sequences, so that the result is a list like the following.

# [[1]]
# [1] 7
#
# [[2]]
# [1] 1 2 3 4
#
# [[3]]
# [1] 6 7
#
# [[4]]
# [1] 9

Is there a quick and simple way to do this?

I've tried

split(x, c(0, diff(x)))

which gets close, but I don't feel like appending 0 to the differenced vector is the right way to go. Using findInterval didn't work either.

Rich Scriven
  • 97,041
  • 11
  • 181
  • 245

2 Answers2

16
split(x, cumsum(c(TRUE, diff(x)!=1)))
#$`1`
#[1] 7
#
#$`2`
#[1] 1 2 3 4
#
#$`3`
#[1] 6 7
#
#$`4`
#[1] 9
Roland
  • 127,288
  • 10
  • 191
  • 288
  • Can you explain how the diff() function works and what it is doing in this solution? The official documentation on the diff() function did not help me understand it. – OnlyDean Jun 21 '18 at 14:36
  • The function simply calculates all differences between consecutive vector elements. E.g., compare `print(x <- (1:5)^2)` with `diff(x)`. Since OP defined sequences as values having a difference of exactly one, I check for differences different from one. Check out (with OP's data) `diff(x); diff(x)!=1; cumsum(c(TRUE, diff(x)!=1))`. – Roland Jun 21 '18 at 14:45
1

Just for fun, you can make use of Carl Witthoft's seqle function from his "cgwtools" package. (It's not going to be anywhere near as efficient as Roland's answer.)

library(cgwtools)

## Here's what seqle does...
## It's like rle, but for sequences
seqle(x)
# Run Length Encoding
#   lengths: int [1:4] 1 4 2 1
#   values : num [1:4] 7 1 6 9

y <- seqle(x)
split(x, rep(seq_along(y$lengths), y$lengths))
# $`1`
# [1] 7
# 
# $`2`
# [1] 1 2 3 4
# 
# $`3`
# [1] 6 7
# 
# $`4`
# [1] 9
Community
  • 1
  • 1
A5C1D2H2I1M1N2O1R2T1
  • 190,393
  • 28
  • 405
  • 485