R: find indexes of vector in another vector (if it exists)

Question

I would like to know the starting index of a vector in another vector. For example, for c(1, 1) and c(1, 0, 0, 1, 1, 0, 1) it would be 4.

What is important I want to look for exactly the same vector. Thus, for c(1, 1) inside c(1, 0, 1, 1, 1, 0) it is FALSE as c(1, 1) != c(1, 1, 1).

For now I am checking if the short vector is contained in the long like this:

any(with(rle(longVec), lengths[as.logical(values)]) == length(shortVec)

But I don't know how to determine the index of it...

c(1,1) also doesn't equal c(1,1,0). How do you decide that your example is false but mine should give you 1 as the result? — iod, Nov 14 '18 at 15:39
I don't understand what do you mean but for c(1,1) inside c(1,1,0) using the function I prepared it is TRUE. Just I would need the index (1 in this case) as well — jake-ferguson, Nov 14 '18 at 15:42
Related: [Get indexes of a vector of numbers in another vector](https://stackoverflow.com/questions/48660606/get-indexes-of-a-vector-of-numbers-in-another-vector) — Henrik, Nov 14 '18 at 15:57
Based on the link provided by @Henrik, you can start with `shortVec <- c(1,1); longVec <- c(1,0,0,1,1,0,1); idx <- which(longVec == shortVec[1]); idx[sapply(idx, function(i) all(longVec[i:(i+(length(shortVec)-1))] == shortVec))][1]` — nghauran, Nov 14 '18 at 16:27
@jake-ferguson dude, why did you delete your other question? I put quite a bit of effort in answering it. — Gregor Thomas, Nov 15 '18 at 04:01

RLave · Answer 1 · 2018-11-15T07:40:29.783

3

This function should work:

my_function <- function(x, find) {
  # we create two matrix from rle function
  m = matrix(unlist(rle(x)), nrow=2, byrow = T) 
  n = matrix(unlist(rle(find)), nrow=2, byrow = T)

  # for each column in m we see if its equal to n
  temp_bool = apply(m, 2, function(x) x == n) # this gives a matrix of T/F
  # then we simply sum by columns, if we have at least a 2 it means that we found (1,1) at least once
  temp_bool = apply(temp_bool, 2, sum)

  # updated part
  if (any(temp_bool==2)) {
    return(position = which(temp_bool==2)+1)
  } else {
    return(position = FALSE)
  }

}


my_function(x, find)
#[1] 4

my_function(y, find)
#[1] FALSE

To make it more clear here I show the results from those two apply:

apply(m, 2, function(x) x == n)
#       [,1]  [,2] [,3]  [,4]  [,5]
# [1,] FALSE  TRUE TRUE FALSE FALSE
# [2,]  TRUE FALSE TRUE FALSE  TRUE  # TRUE-TRUE on column 3 is what we seek

apply(temp_bool, 2, sum)
#[1] 1 1 2 0 1

Example data:

x <- c(1,0,0,1,1,0,1)
y <-  c(1,0,1,1,1,0)
find <- c(1,1) # as pointed this needs to be a pair of the same number

edited Nov 15 '18 at 07:40

answered Nov 14 '18 at 15:46

RLave

8,144
3
21
37

1

That's a very cool and elegant solution! Never heard about rle before -- could've been so useful to me in the past! – iod Nov 14 '18 at 15:54
1

But this assumes that find is going to be a run of equal values, doesn't it? It won't work for find=c(1,0), for example, right? – iod Nov 14 '18 at 15:55
yes, that's correct, nice catch. But maybe that's not a case op needs. I'll see if I can improve it. – RLave Nov 14 '18 at 15:58
But that's doing exaclty the same as my one-line solution in the question, right? We get T or F, but still we don't get the index of it... @RLave – jake-ferguson Nov 14 '18 at 18:58
what I would like to have is to get for example 4 (Index of first occurance) or false if we don't find the vector inside the longer one – jake-ferguson Nov 14 '18 at 19:01

G. Grothendieck · Answer 2 · 2018-11-14T16:11:18.490

Assuming that shortVec contains only ones and longVec contains only zeros and ones use rle and rep to create a vector lens the same length as longVec such that each element in each run is replaced by that run's length. Then multiply that by longVec to zero out the elements corresponding to 0 in longVec. Now return the indices corresponding to elements equal to length(shortVec) and take the first.

lookup <- function(shortVec, longVec) {
  lens <- with(rle(longVec), rep(lengths, lengths))
  which(lens * longVec == length(shortVec))[1]
}

lookup(c(1,1), c(1, 0, 0, 1, 1, 0, 1))
## [1] 4

lookup(c(1,1), c(1, 0, 0, 1, 1, 1, 0, 1))
## [1] NA

score 0 · Answer 3 · answered Nov 14 '18 at 15:54

0

This works for the examples below.

a <- c(1,1)
b <- c(1,0,1,1,0,0)
c <- c(1,0,1,1,1,0)

f <- function(x, y) {
  len.x <- length(x)
  len.y <- length(y)
  for(i in 1:(len.y - (len.x - 1))) {
    if(identical(y[i:(i + (len.x - 1))], x)){
      if(y[i + len.x] != x[len.x] & y[i - 1] != x[1]) {return(TRUE)}
    }
  }
  return(FALSE)
}
f(a, b)
# TRUE
f(a, c)
# FALSE

answered Nov 14 '18 at 15:54

Cleland

349
1
6

1

It fails if x exists as the first or last elements of y though – Cleland Nov 14 '18 at 16:02

R: find indexes of vector in another vector (if it exists)

3 Answers3

Linked