6

I have:

vec1 <- c(0, 0, 0, 1, 1, 0, 1, 1, 1, 0, 0, 1)
vec2 <- c(1, 1)

I expect:

magicFUN(x = vec1, y = vec2)
[1] 4 7 8

That means that I want the position of a complete vector inside another vector. match and is.element were not useful because they return the position of each element of vec2 and I need that magicFUN matches the complete vec2 into vec1.

7 Answers7

3

A general solution:

magicFUN <- function(vec1, vec2) {
  if(length(vec2) > length(vec1)) stop("vec 2 should be shorter")
  len <- length(vec1) - length(vec2) + 1
  out <- vector(mode = "logical", length=len)     
  for(i in 1:len) {
    out[i] <- identical(vec2, vec1[i:(i+length(vec2)-1)])
  }
  return(which(out))
}

vec1 <- c(0, 0, 0, 1, 1, 0, 1, 1, 1, 0, 0, 1)
vec2 <- c(1, 1)

magicFUN(vec1, vec2)

[1] 4 7 8

A for loop will be the fastest solution (besides using Rcpp). See benchmarks below:

magicFUN <- function(vec1, vec2) {
  if(length(vec2) > length(vec1)) stop("vec 2 should be shorter")
  len <- length(vec1) - length(vec2) + 1
  out <- vector(mode = "logical", length=len)     
  for(i in 1:len) {
    out[i] <- identical(vec2, vec1[i:(i+length(vec2)-1)])
  }
  return(which(out))
}

magicFUN2 <- function(vec1, vec2){
  l1 <- length(vec1)
  l2 <- length(vec2)
  which(colSums(sapply(1:(l1-l2), function(i) vec1[i:(i+l2-1)]) == vec2) == l2)
}

magicFUN3 <- function(vec1, vec2){
  which(c(zoo::rollapply(vec1, width=length(vec2), 
                    function(x)all(x==vec2), align = "left"),rep(FALSE,length(vec2)-1))==TRUE)
}

library(microbenchmark)
microbenchmark(magicFUN(vec1, vec2), magicFUN2(vec1, vec2), magicFUN3(vec1, vec2))

Unit: milliseconds
                  expr       min        lq      mean    median        uq       max neval cld
  magicFUN(vec1, vec2)  6.083572  6.575844  7.292443  6.878016  7.421208  13.35746   100  a 
 magicFUN2(vec1, vec2)  8.289640  8.976736 11.007967  9.338644  9.951492 139.68886   100  a 
 magicFUN3(vec1, vec2) 39.131268 42.369479 46.303722 44.203563 45.053252 172.46151   100   b
thc
  • 9,527
  • 1
  • 24
  • 39
1

Here is one way, but wouldn't scale well, if length of vec2 grow:

which(head(vec1, -1) == vec2[1] & tail(vec1, -1) == vec2[2])
# [1] 4 7 8

Edit: More general solution.

magicFUN <- function(vec1, vec2){
  l1 <- length(vec1)
  l2 <- length(vec2)
  which(colSums(sapply(1:(l1-l2), function(i) vec1[i:(i+l2-1)]) == vec2) == l2)
}

magicFUN(vec1, vec2)
# [1] 4 7 8
zx8754
  • 52,746
  • 12
  • 114
  • 209
  • 1
    Thank you, but, of course, the example that I put is just for explain quickly my problem. In my real data vec1 is a 10k length vector and vec2 is a variable length (2-10) vector. – Wencheng Lau-Medrano May 30 '18 at 20:15
1

An option is to use zoo package as:

library(zoo)
which(c(rollapply(vec1, width=2, function(x)all(x==vec2), align = "left"),0)==TRUE)
[1] 4 7 8

Edited: Based on feedback from @G.Grothendieck:

The above solution can scale-up nicely using length(vec2). Lets create magicFUN as:

magicFUN <- function(vec1, vec2){
  which(rollapply(vec1, length(vec2), identical, vec2, align = "left"))
}    

magicFUN(vec1, vec2)
#[1] 4 7 8
MKR
  • 19,739
  • 4
  • 23
  • 33
  • 1
    Note that we can write: `which(rollapply(vec1, length(vec2), identical, vec2, align = "left"))` . – G. Grothendieck May 31 '18 at 21:41
  • @G.Grothendieck Very valid point. Actually, I had modified my code to use that alternative while trying to improve performance but it didn't improve much. Hence, I didn't thought of updating it. It's good that you have pointed out today. I'll update my answer now. Thanks. – MKR May 31 '18 at 22:02
1

Here is another method using grep:

vec1 <- c(0, 0, 0, 1, 1, 0, 1, 1, 1, 0, 0, 1)
vec2 <- c(1, 1)

magicFUN = function(x, y){
  y_len = length(y)
  temp_x = do.call(paste, lapply((y_len-1):0, function(lags){
    lag(x, lags)[-(0:(y_len-1))]
  }))
  temp_y = paste(y, collapse = ' ')
  return(grep(temp_y, temp_x, fixed = TRUE))
}

magicFUN(vec1, vec2)
# [1] 4 7 8
acylam
  • 18,231
  • 5
  • 36
  • 45
1

Here's an attempt to vectorize this using data.table package. Though if vec2 gets very long, it could have some memory issues potentially

library(data.table)
l2 <- length(vec2)
setDT(shift(vec1, 0 : (l2 - 1), type = "lead")
      )[, which(rowSums(.SD == vec2[col(.SD)]) == l2)]
## [1] 4 7 8
David Arenburg
  • 91,361
  • 17
  • 137
  • 196
0

A general solution where a is the longer vector and b the shorter

magicFun <- function(a,b){
  la <- length(a)
  lb <- length(b)
  out <- vector(mode = "numeric", length = la-1)
  for(i in 1:(la-lb)) {
    out[i] <- ifelse(all(a[i:(i+lb-1)] == b),i,0)
  }
  out <- out[out != 0]
  return(out)
}

magicFun(vec1,vec2)

[1] 4 7 8
moooh
  • 459
  • 3
  • 10
0

in base R:

which(sapply(seq_along(vec1),function(x) identical(vec1[x:(x+1)],vec2)))
# [1] 4 7 8

Should support lists and any data type.

moodymudskipper
  • 46,417
  • 11
  • 121
  • 167