Is there a way to encode increasing integer sequences in R, analogous to encoding run lengths using run length encoding (rle
)?
I'll illustrate with an example:
Analogy: Run length encoding
r <- c(rep(1, 4), 2, 3, 4, rep(5, 5))
rle(r)
Run Length Encoding
lengths: int [1:5] 4 1 1 1 5
values : num [1:5] 1 2 3 4 5
Desired: sequence length encoding
s <- c(1:4, rep(5, 4), 6:9)
s
[1] 1 2 3 4 5 5 5 5 6 7 8 9
somefunction(s)
Sequence lengths
lengths: int [1:4] 5 1 1 5
value1 : num [1:4] 1 5 5 5
Edit 1
Thus, somefunction(1:10)
will give the result:
Sequence lengths
lengths: int [1:1] 10
value1 : num [1:1] 1
This results means that there is an integer sequence of length 10 with starting value of 1, i.e. seq(1, 10)
Note that there isn't a mistake in my example result. The vector in fact ends in the sequence 5:9, not 6:9 which was used to construct it.
My use case is that I am working with survey data in an SPSS export file. Each subquestion in a grid of questions will have a name of the pattern paste("q", 1:5)
, but sometimes there is an "other" category which will be marked q_99
, q_other
or something else. I wish to find a way of identifying the sequences.
Edit 2
In a way, my desired function is the inverse of the base function sequence
, with the start value, value1
in my example, added.
lengths <- c(5, 1, 1, 5)
value1 <- c(1, 5, 5, 5)
s
[1] 1 2 3 4 5 5 5 5 6 7 8 9
sequence(lengths) + rep(value1-1, lengths)
[1] 1 2 3 4 5 5 5 5 6 7 8 9
Edit 3
I should have stated that for my purposes a sequence is defined as increasing integer sequences as opposed to monotonically increasing sequences, e.g. c(4,5,6,7)
but not c(2,4,6,8)
nor c(5,4,3,2,1)
. However, any other integer can appear between sequences.
This means a solution should be able to cope with this test case:
somefunction(c(2, 4, 1:4, 5, 5))
Sequence lengths
lengths: int [1:4] 1 1 5 1
value1 : num [1:4] 2 4 1 5
In the ideal case, the solution can also cope with the use case suggested originally, which would include characters in the vector, e.g.
somefunction(c(2, 4, 1:4, 5, "other"))
Sequence lengths
lengths: int [1:5] 1 1 5 1 1
value1 : num [1:5] 2 4 1 5 "other"