0

(I expect this has already been asked/answered. If so, sorry, I'm failing to locate the answer.)

Let's say I have 6 vectors. How can I quickly check whether any element for each vector is equal to any element of all the other vectors?

I know I could do the following, and it feels really cumbersome/pre-historic/error-prone:

any(vec1 %in% vec2, vec1 %in% vec3, vec1 %in% vec4, vec1 %in% vec5, vec1 %in% vec6,
    vec2 %in% vec3, vec2 %in% vec4, vec2 %in% vec5, vec2 %in% vec6,
    vec3 %in% vec4, vec3 %in% vec5, vec3 %in% vec6,
    vec4 %in% vec5, vec4 %in% vec6,
    vec5 %in% vec6)

Thanks.

By the way, I checked How to find common elements from multiple vectors? and that appears to be asking for how to identify elements that are present in each vector, rather than if any elements from any of the vectors are equal.

Community
  • 1
  • 1
Daniel Fletcher
  • 1,165
  • 4
  • 13
  • 23
  • Are vectors of any particular type? – Severin Pappadeux Aug 21 '16 at 03:35
  • @SeverinPappadeux, in my case they're all `int`. I suppose the answer might be different if some of the vectors are of a different type. – Daniel Fletcher Aug 21 '16 at 03:41
  • Your code is wrapping `any` to all the comparisons, but I think it will create problems, i.e. if the vectors are of different lengths, the `%in%` gives a logical vector output and when we wrap with `any` (as you showed), whenever there is a single `TRUE` in any of the individual elements in the comparison, the output will be TRUE. – akrun Aug 21 '16 at 03:56
  • @akrun, this is exactly what I'm looking for. However, I agree it would probably help to know _which_ vectors have matching elements (and maybe even cooler if the solution could show which _elements_ match, without doing the additional vec1 %in% vec2 comparison, for instance). – Daniel Fletcher Aug 21 '16 at 04:03
  • I posted a solution below with the last option as the one you showed. Please check if that works for you. Also, if not, please consider to update your post with some example and expected output based on the example – akrun Aug 21 '16 at 04:05
  • Thinking out loud: `unique` on each vector, combine them into single vector, take length, `unique` on combined vector, check length – Severin Pappadeux Aug 21 '16 at 04:18
  • @akrun the first two options appear to work fine for me. To be blunt, I'm hoping for something a little simpler. However, if I fail to see any other answers that meet my simplicity goal, I'll accept your answer. As for the 3rd option, I'm actually getting `Error in combn(mget(paste0("seg", 1:6, "_index")), 2, FUN = function(x) x[[1]] %in% : number of items to replace is not a multiple of replacement length` – Daniel Fletcher Aug 21 '16 at 04:26

2 Answers2

2

If you put your vectors in a list, they'll be substantially easier to work with:

# make sample data
set.seed(47)
x <- replicate(6, rpois(3, 10), simplify = FALSE) 

str(x)
# List of 6
#  $ : int [1:3] 16 12 10
#  $ : int [1:3] 9 10 6
#  $ : int [1:3] 10 14 4
#  $ : int [1:3] 7 6 4
#  $ : int [1:3] 12 8 7
#  $ : int [1:3] 7 11 8

Now iterate with lapply:

lapply(x, function(y){sapply(x, function(z){y %in% z})})

## [[1]]
##      [,1]  [,2]  [,3]  [,4]  [,5]  [,6]
## [1,] TRUE FALSE FALSE FALSE FALSE FALSE
## [2,] TRUE FALSE FALSE FALSE  TRUE FALSE
## [3,] TRUE  TRUE  TRUE FALSE FALSE FALSE
## 
## [[2]]
##       [,1] [,2]  [,3]  [,4]  [,5]  [,6]
## [1,] FALSE TRUE FALSE FALSE FALSE FALSE
## [2,]  TRUE TRUE  TRUE FALSE FALSE FALSE
## [3,] FALSE TRUE FALSE  TRUE FALSE FALSE
## ...    ...  ...   ...   ...   ...   ...

which is a matrix for each vector, where the rows are the elements of that respective vector and the columns are each of the vectors in the list, and the values indicate whether that element is in that vector. Obviously each will match with itself, so the first column of the first element is all TRUE, as is the second column of the second element, etc. Other TRUEs indicate cross-vector matches. If lengths are inconsistent, it will return a nested list of the same information instead of a matrix. If you'd rather have a nested list anyway, change sapply to lapply.

Alternately, if you just want a vector of matches for each vector,

str(lapply(x, function(y){which(sapply(x, function(z){any(y %in% z)}))}))

## List of 6
##  $ : int [1:4] 1 2 3 5
##  $ : int [1:4] 1 2 3 4
##  $ : int [1:4] 1 2 3 4
##  $ : int [1:5] 2 3 4 5 6
##  $ : int [1:4] 1 4 5 6
##  $ : int [1:3] 4 5 6

where each element still contains itself as a match. Take out which for Booleans instead of indices.

alistaire
  • 42,459
  • 4
  • 77
  • 117
1

We can use combn to find the combination of vector strings, get the data, compare them with %in%, wrap with any and unlist if needed

v1 <- unlist(combn(paste0("vec", 1:6), 2, FUN = function(x) 
             any(get(x[1]) %in% get(x[2])), simplify = FALSE))
names(v1) <- combn(paste0("vec", 1:6), 2, FUN = paste, collapse="-")

As the OP mentioned about efficiency, a faster version of combn can be used if needed.


Also, combn can be applied directly to list. So, the vectors can be placed in a list and then do the combn

v2 <- combn(mget(paste0("vec", 1:6)), 2, FUN = function(x) any(x[[1]] %in% x[[2]]))
names(v2) <- names(v1)

Also, as the OP is wrapping any over all the comparisons. we can also do it with one any

any(combn(mget(paste0("vec", 1:6)), 2, FUN = function(x) x[[1]] %in% x[[2]]))

but I am not sure whether that is a correct way.

data

vec1 <- 1:6
vec2 <- 2:3
vec3 <- 5:7
vec4 <- 6:8
vec5 <- 9:10
vec6 <- 11:12
Community
  • 1
  • 1
akrun
  • 874,273
  • 37
  • 540
  • 662
  • is the second option independent from the first for the match location identification? Seems like `names(v2) <- names(v1)` might depend on the first option because of the `v1` inclusion. – Daniel Fletcher Aug 21 '16 at 04:07
  • @DanielFletcher I didn't want to type the `names(v1)` twice as it is the same as the first case. Otherwise, it is the same. – akrun Aug 21 '16 at 04:10
  • I accepted alistaire's answer because it feels the simplest to me. Technically, we could argue it fails to give a singular `TRUE` or `FALSE`, as requested in the question. However, it's pretty close, and I'm unable to get the `any` option from this answer to work for me. – Daniel Fletcher Aug 21 '16 at 04:59