2

Let's say I have 4 vectors:

a <- c("Mark","Kate","Greg", "Mathew")
b <- c("Mark","Tobias","Mary", "Mathew", "Greg")
c <- c("Mary","Chuck","Igor", "Mathew", "Robin", "Tobias")
d <- c("Kate","Mark","Igor", "Greg", "Robin", "Mathew")

I would like to select overlapping names from those vectors with an assumption that the name has to appear in at least 3 out of those 4 vectors. Of course I would like to make it easy to play with percentage of vectors the name has to be present.

Can I modify intersect somehow ?

Shaxi Liver
  • 1,052
  • 3
  • 25
  • 47

1 Answers1

7

I think this would work. We use the table function to do most of the heavy lifting.

find_perc <- function(..., perc = .75){
    list_len <- length(list(...)) # how many vectors
    tab_it <- table(c(...)) # tabulate all the names
    tab_it_perc <- tab_it / list_len # calculate the frequencies
    names(tab_it_perc[tab_it_perc >= perc]) # return those with freq >= perc
}


> find_perc(a, b, c, d)
[1] "Greg"   "Mark"   "Mathew"
> find_perc(a, b, c, d, perc = .5)
[1] "Greg"   "Igor"   "Kate"   "Mark"   "Mary"   "Mathew" "Robin"  "Tobias"
bouncyball
  • 10,631
  • 19
  • 31