0

I am trying to create a pairwise similarity matrix where I compare the similarity of each HPO term to every other HPO term using the "getSimWang" function of the R package HPOSim. Package available here: https://sourceforge.net/projects/hposim/

I can create the pairwise similarity matrix for a subset of the HPO terms (there are ~13,000) using the following:

list1<-c("HP:0002404","HP:0011933","HP:0030286")

custom <- function(x,y){ 
           z <- getSimWang(x,y)
           return(z)
        }

outer(list1, list1, Vectorize(custom))
         [,1]      [,2] [,3]
[1,] 1.0000000 0.6939484    0
[2,] 0.6939484 1.0000000    0
[3,] 0.0000000 0.0000000    1

sapply(list1, function(x) sapply(list1, function(y) custom(x,y)))
           HP:0002404 HP:0011933 HP:0030286
HP:0002404  1.0000000  0.6939484          0
HP:0011933  0.6939484  1.0000000          0
HP:0030286  0.0000000  0.0000000          1

However, when I tried to expand this code to apply to the rest of the HPO terms, R was calculating for 24+ hours, and when I used pbsapply to estimate the time it would take, it estimated it would be 20 days!

I have also tried mapply - but that only gives me a subset of the calculations (x1y1, x2y2, and x3y3) rather than all combinations (x1y1, x1y2, x1y3, etc).

mapply(custom, list1, list1)

HP:0002404 HP:0011933 HP:0030286 
         1          1          1

And the xapply solution here, but when I run that I lose the information about what terms are being compared:

xapply(FUN = custom, list1, list1)
[[1]]
[1] 1

[[2]]
[1] 0.6939484

[[3]]
[1] 0

[[4]]
[1] 0.6939484

[[5]]
[1] 1

[[6]]
[1] 0

[[7]]
[1] 0

[[8]]
[1] 0

[[9]]
[1] 1

Is there a different method that I am missing in order to get the pairwise (or ideally non-redundant pairwise) calculations for the similarity? Or is this really going to take 20 days?!?

ahhn
  • 89
  • 8

0 Answers0