I am looking to return a logical to a data table whereby a function is applied that establishes whether a certain sequence of numbers exists in that row, regardless of the length of each element of that sequence.
e.g. in c(1,1,1,3,3,2,2,2,2,2,1) I am interested if c(1,3,2) exists in that order. It does not matter how long each element of the nominated sequence is. Using first rle
and then "%seq_in%"
as defined by a user in this post, we can do the following;
# this function searches for a specific vector in order in another vector
"%seq_in%" = function(b,a) any(sapply(1:(length(a)-length(b)+1),function(i) all(a[i:(i+length(b)-1)]==b)))
v1 <- c(1,1,1,3,3,2,2,2,2,2,1)
c(1,3,2) %seq_in% rle(v1)$values
[1] TRUE
# for clarity
c(1,2,3) %seq_in% rle(v1)$values
[1] FALSE
so, i would like to do the same to a data table, look for a specific sequence, regardless of length of each element, against every row of the data table.
# dummy data
dt_dummy <- data.table(A = c(2,2,3,3,1),B = c(3,2,2,1,3), C = c(2,2,3,3,1), D = c(2,3,2,2,3),
E = c(2,3,2,1,1), F = c(2,2,2,1,3), G = c(3,2,3,2,2), H = c(2,3,1,2,2))
dt_dummy
A B C D E F G H
1: 2 3 2 2 2 2 3 2
2: 2 2 2 3 3 2 2 3
3: 3 2 3 2 2 2 3 1
4: 3 1 3 2 1 1 2 2
5: 1 3 1 3 1 3 2 2
# define simple function to return the values from rle
f1 <- function(v){
v1 <- unlist(rle(v)$values)
return(v1)
}
# apply to every row of dt
dt_dummy[, GCG_Rot := c(3,2,3) %seq_in% f1(dt_dummy), by = seq_len(nrow(dt_dummy))]
I cant seem to get the function to work, where the generated column is TRUE or FALSE
Rows 1, 2, & 3 should adhere to the nominated sequence and return TRUE.
If there's a way of dropping %seq_in% i'm all for it!!