2

I would like to know the starting index of a vector in another vector. For example, for c(1, 1) and c(1, 0, 0, 1, 1, 0, 1) it would be 4.

What is important I want to look for exactly the same vector. Thus, for c(1, 1) inside c(1, 0, 1, 1, 1, 0) it is FALSE as c(1, 1) != c(1, 1, 1).

For now I am checking if the short vector is contained in the long like this:

any(with(rle(longVec), lengths[as.logical(values)]) == length(shortVec)

But I don't know how to determine the index of it...

Henrik
  • 65,555
  • 14
  • 143
  • 159
jake-ferguson
  • 315
  • 3
  • 11
  • 32
  • 1
    c(1,1) also doesn't equal c(1,1,0). How do you decide that your example is false but mine should give you 1 as the result? – iod Nov 14 '18 at 15:39
  • I don't understand what do you mean but for c(1,1) inside c(1,1,0) using the function I prepared it is TRUE. Just I would need the index (1 in this case) as well – jake-ferguson Nov 14 '18 at 15:42
  • 1
    Related: [Get indexes of a vector of numbers in another vector](https://stackoverflow.com/questions/48660606/get-indexes-of-a-vector-of-numbers-in-another-vector) – Henrik Nov 14 '18 at 15:57
  • Based on the link provided by @Henrik, you can start with `shortVec <- c(1,1); longVec <- c(1,0,0,1,1,0,1); idx <- which(longVec == shortVec[1]); idx[sapply(idx, function(i) all(longVec[i:(i+(length(shortVec)-1))] == shortVec))][1]` – nghauran Nov 14 '18 at 16:27
  • 1
    @jake-ferguson dude, why did you delete your other question? I put quite a bit of effort in answering it. – Gregor Thomas Nov 15 '18 at 04:01

3 Answers3

3

This function should work:

my_function <- function(x, find) {
  # we create two matrix from rle function
  m = matrix(unlist(rle(x)), nrow=2, byrow = T) 
  n = matrix(unlist(rle(find)), nrow=2, byrow = T)

  # for each column in m we see if its equal to n
  temp_bool = apply(m, 2, function(x) x == n) # this gives a matrix of T/F
  # then we simply sum by columns, if we have at least a 2 it means that we found (1,1) at least once
  temp_bool = apply(temp_bool, 2, sum)

  # updated part
  if (any(temp_bool==2)) {
    return(position = which(temp_bool==2)+1)
  } else {
    return(position = FALSE)
  }

}


my_function(x, find)
#[1] 4

my_function(y, find)
#[1] FALSE

To make it more clear here I show the results from those two apply:

apply(m, 2, function(x) x == n)
#       [,1]  [,2] [,3]  [,4]  [,5]
# [1,] FALSE  TRUE TRUE FALSE FALSE
# [2,]  TRUE FALSE TRUE FALSE  TRUE  # TRUE-TRUE on column 3 is what we seek

apply(temp_bool, 2, sum)
#[1] 1 1 2 0 1

Example data:

x <- c(1,0,0,1,1,0,1)
y <-  c(1,0,1,1,1,0)
find <- c(1,1) # as pointed this needs to be a pair of the same number
RLave
  • 8,144
  • 3
  • 21
  • 37
  • 1
    That's a very cool and elegant solution! Never heard about rle before -- could've been so useful to me in the past! – iod Nov 14 '18 at 15:54
  • 1
    But this assumes that find is going to be a run of equal values, doesn't it? It won't work for find=c(1,0), for example, right? – iod Nov 14 '18 at 15:55
  • yes, that's correct, nice catch. But maybe that's not a case op needs. I'll see if I can improve it. – RLave Nov 14 '18 at 15:58
  • But that's doing exaclty the same as my one-line solution in the question, right? We get T or F, but still we don't get the index of it... @RLave – jake-ferguson Nov 14 '18 at 18:58
  • what I would like to have is to get for example 4 (Index of first occurance) or false if we don't find the vector inside the longer one – jake-ferguson Nov 14 '18 at 19:01
1

Assuming that shortVec contains only ones and longVec contains only zeros and ones use rle and rep to create a vector lens the same length as longVec such that each element in each run is replaced by that run's length. Then multiply that by longVec to zero out the elements corresponding to 0 in longVec. Now return the indices corresponding to elements equal to length(shortVec) and take the first.

lookup <- function(shortVec, longVec) {
  lens <- with(rle(longVec), rep(lengths, lengths))
  which(lens * longVec == length(shortVec))[1]
}

lookup(c(1,1), c(1, 0, 0, 1, 1, 0, 1))
## [1] 4

lookup(c(1,1), c(1, 0, 0, 1, 1, 1, 0, 1))
## [1] NA
G. Grothendieck
  • 254,981
  • 17
  • 203
  • 341
0

This works for the examples below.

a <- c(1,1)
b <- c(1,0,1,1,0,0)
c <- c(1,0,1,1,1,0)

f <- function(x, y) {
  len.x <- length(x)
  len.y <- length(y)
  for(i in 1:(len.y - (len.x - 1))) {
    if(identical(y[i:(i + (len.x - 1))], x)){
      if(y[i + len.x] != x[len.x] & y[i - 1] != x[1]) {return(TRUE)}
    }
  }
  return(FALSE)
}
f(a, b)
# TRUE
f(a, c)
# FALSE
Cleland
  • 349
  • 1
  • 6