I am trying to create a pairwise similarity matrix where I compare the similarity of each HPO term to every other HPO term using the "getSimWang" function of the R package HPOSim. Package available here: https://sourceforge.net/projects/hposim/
I can create the pairwise similarity matrix for a subset of the HPO terms (there are ~13,000) using the following:
list1<-c("HP:0002404","HP:0011933","HP:0030286")
custom <- function(x,y){
z <- getSimWang(x,y)
return(z)
}
outer(list1, list1, Vectorize(custom))
[,1] [,2] [,3]
[1,] 1.0000000 0.6939484 0
[2,] 0.6939484 1.0000000 0
[3,] 0.0000000 0.0000000 1
sapply(list1, function(x) sapply(list1, function(y) custom(x,y)))
HP:0002404 HP:0011933 HP:0030286
HP:0002404 1.0000000 0.6939484 0
HP:0011933 0.6939484 1.0000000 0
HP:0030286 0.0000000 0.0000000 1
However, when I tried to expand this code to apply to the rest of the HPO terms, R was calculating for 24+ hours, and when I used pbsapply to estimate the time it would take, it estimated it would be 20 days!
I have also tried mapply - but that only gives me a subset of the calculations (x1y1, x2y2, and x3y3) rather than all combinations (x1y1, x1y2, x1y3, etc).
mapply(custom, list1, list1)
HP:0002404 HP:0011933 HP:0030286
1 1 1
And the xapply solution here, but when I run that I lose the information about what terms are being compared:
xapply(FUN = custom, list1, list1)
[[1]]
[1] 1
[[2]]
[1] 0.6939484
[[3]]
[1] 0
[[4]]
[1] 0.6939484
[[5]]
[1] 1
[[6]]
[1] 0
[[7]]
[1] 0
[[8]]
[1] 0
[[9]]
[1] 1
Is there a different method that I am missing in order to get the pairwise (or ideally non-redundant pairwise) calculations for the similarity? Or is this really going to take 20 days?!?