I have a vector of strings x = c("hello", "world")
and another vector y = c("hello", "world", "how", "are", "you")
. I want to see which elements of x are inside y. For small vector this could easily be done using x %in% y
. However I am looking for a more efficient way to do this - normally we would sort y first in O(n log n)
time, then foreach string inside x we can do lookup in O(log n)
time. I am worried that %in% is doing a full pass over y for each x it is looking up.
Is there a way to take advantage of sort and binary search in R? Or is there a way to build a hashset from y for fast lookup times?