-3

The following code works, but as expected, it takes ages to execute for large vectors.

What would be the vectorised way to accomplish the same task:

x <- seq(0,10,0.01)
y <- seq(0,10,0.01)
df <- data.frame(vector1 = rnorm(10000), vector2 = rnorm(10000), vector3 = rnorm(10000))


m.out <- matrix(nrow=length(x),ncol = length(y))

a <- df$vector1
b <- df$vector2
c <- df$vector3

for (i in 1:length(x)){
  for(j in 1:length(y)){
    m.out[i,j] <- cor((x[i]*a + y[j]*b),c,use="complete.obs",method = "pearson")
  }
}

Thanks,

Mario Reyes
  • 385
  • 1
  • 2
  • 13

1 Answers1

0

Please see vectorized version below, you can use mapply and expand.grid. To return to wide dataset format you can use dcast of reshape2 package (however it still takes some time):

set.seed(123)
x <- seq(0, 10, 0.01)
y <- seq(0, 10, 0.01)

# simulation
df <- data.frame(vector1 = rnorm(length(x)), vector2 = rnorm(length(x)), vector3 = rnorm(length(x)))
a <- df$vector1
b <- df$vector2
c <- df$vector3

v <- expand.grid(x, y)
v$out <- mapply(function(n, m) cor(n * a + m * b, c, use = "complete.obs", method = "pearson"), v[, 1], v[, 2])
library(reshape2)
z <- dcast(v, Var1 ~ Var2)
rownames(z) <- z$Var1
z <- z[, -1]
head(z[, 1:5])

Output:

               0          0.01          0.02          0.03          0.04
0             NA  0.0140699293  0.0140699293  0.0140699293  0.0140699293
0.01 -0.01383734  0.0003350528  0.0065542508  0.0090938390  0.0103897953
0.02 -0.01383734 -0.0059841841  0.0003350528  0.0042062076  0.0065542508
0.03 -0.01383734 -0.0086178379 -0.0035752709  0.0003350528  0.0031310581
0.04 -0.01383734 -0.0099713568 -0.0059841841 -0.0024814273  0.0003350528
0.05 -0.01383734 -0.0107798236 -0.0075458061 -0.0045052606 -0.0018627055
Artem
  • 3,304
  • 3
  • 18
  • 41
  • Thanks for the reply @Artem; however, after some bench marking, my original approach takes roughly the same amount of time than your version. Cheers – Mario Reyes Sep 14 '18 at 05:01