Find the common elements from multiple vectors which appear at least in percentage of them

Question

Let's say I have 4 vectors:

a <- c("Mark","Kate","Greg", "Mathew")
b <- c("Mark","Tobias","Mary", "Mathew", "Greg")
c <- c("Mary","Chuck","Igor", "Mathew", "Robin", "Tobias")
d <- c("Kate","Mark","Igor", "Greg", "Robin", "Mathew")

I would like to select overlapping names from those vectors with an assumption that the name has to appear in at least 3 out of those 4 vectors. Of course I would like to make it easy to play with percentage of vectors the name has to be present.

Can I modify intersect somehow ?

Will the same name ever appear multiple times in the same vector? — bouncyball, Jan 05 '17 at 14:28
http://stackoverflow.com/questions/3695677/how-to-find-common-elements-from-multiple-vectors — USER_1, Jan 05 '17 at 14:29
It will not. Names in a vectors should be different. `USER_1`, does it answer my question ? — Shaxi Liver, Jan 05 '17 at 14:30
With `table`: `temp <- table(c(a,b,c,d)); names(temp[temp > 2])`. — lmo, Jan 05 '17 at 14:32

bouncyball · Accepted Answer · 2017-01-05T14:34:47.700

I think this would work. We use the table function to do most of the heavy lifting.

find_perc <- function(..., perc = .75){
    list_len <- length(list(...)) # how many vectors
    tab_it <- table(c(...)) # tabulate all the names
    tab_it_perc <- tab_it / list_len # calculate the frequencies
    names(tab_it_perc[tab_it_perc >= perc]) # return those with freq >= perc
}


> find_perc(a, b, c, d)
[1] "Greg"   "Mark"   "Mathew"
> find_perc(a, b, c, d, perc = .5)
[1] "Greg"   "Igor"   "Kate"   "Mark"   "Mary"   "Mathew" "Robin"  "Tobias"

Find the common elements from multiple vectors which appear at least in percentage of them

1 Answers1