0

I have two for loops in R with a data around 150000 observation. I tried apply() family functions but they were slower than for loop in my case. here is my code: where k=500 and N= 150000, x is location at each time t (for all observation) and xm is specific x with a specific coordination that I filtered here. At each time j we observe xm so we remove it from the data and fit the model with the rest of dataset. I had an if else condition here that removed it in order to make the loop faster.

It's so slow, I am so thankful for your help!

xs = 0:200
result= matrix(0, k,N ) 

for (j in 1: N){
for ( i in 1:k){
  a <- sum(dnorm(xs[i],xm[-j],bx))
  b <-  sum(dnorm(xs[i],x[-ind[j]],bx))
  result[i,j]<-a/b
}   
}
DaveArmstrong
  • 18,377
  • 2
  • 13
  • 25
  • You are doing 75 million iterations. It is going to be slow in any interpreted language. Have a look at [this](https://stackoverflow.com/a/7142846/12545041) answer for example. – SamR Apr 12 '22 at 13:22
  • can you define `x`,`xs`, `xm`, `bx`,`ind` (also `k`, `N`) in you code? you have left a lot out... – minem Apr 12 '22 at 13:51
  • `ind = which(u==1); ms= m[ind,]; xm= x[ind]; bx = 2; k= length(xs)=501; N=length(xm)=150000; xs = 0:500; Sm= diag(150,4,4)` – Nate Lee Apr 13 '22 at 05:12

1 Answers1

0

Using dummy values ind, x, and xm, here is a solution that runs in about 10 seconds on my machine (>1000 times faster than the original code).

# start with a small N for verification
N <- 15e2L
xm <- runif(N)
x <- runif(N)
ind <- sample(N)
k <- 501L
xs <- 0:500
bx <- 2

system.time({
  # proposed solution
  a <- outer(xs, xm, function(x, y) dnorm(x, y, bx))
  b <- outer(xs, x[ind], function(x, y) dnorm(x, y, bx))
  result1 <- (rowSums(a) - a)/(rowSums(b) - b)
})
#>    user  system elapsed 
#>    0.08    0.02    0.10

system.time({
  # OP's solution
  result2 <- matrix(0, k, N)
  
  for (j in 1:N){
    for (i in 1:k){
      a <- sum(dnorm(xs[i], xm[-j], bx))
      b <- sum(dnorm(xs[i], x[-ind[j]], bx))
      result2[i,j] <- a/b
    }
  }
})
#>    user  system elapsed 
#>  109.42    0.80  110.90

# check that the results are the same
all.equal(result1, result2)
#> [1] TRUE

# use a large N
N <- 15e4L
xm <- runif(N)
x <- runif(N)
ind <- sample(N)

system.time({
  a <- outer(xs, xm, function(x, y) dnorm(x, y, bx))
  b <- outer(xs, x[ind], function(x, y) dnorm(x, y, bx))
  result1 <- (rowSums(a) - a)/(rowSums(b) - b)
})
#>    user  system elapsed 
#>    8.62    1.10    9.73
jblood94
  • 10,340
  • 1
  • 10
  • 15
  • thank you so much! not sure why I get this error when trying to run your code: Error in outer(xs, xm, function(x, y) dnorm(x, y, bx)) : unused argument (function(x, y) dnorm(x, y, bx)) – Nate Lee Apr 16 '22 at 02:58
  • What version of R are you using? It works here: https://rdrr.io/snippets/ – jblood94 Apr 17 '22 at 01:00
  • the code is running now but I guess it does not exclude the current time observation and calculates probability of all observation at each time point – Nate Lee Apr 18 '22 at 02:19
  • Take a closer look. It performs the same calculations as your double `for` loop, as demonstrated by `all.equal(result1, result2)` returning `TRUE`. To get the incorrect result that you are describing, it would be `rowSums(a)/rowSums(b)` instead of `(rowSums(a) - a)/(rowSums(b) - b)`. – jblood94 Apr 18 '22 at 11:25
  • in this code { a <- outer(xs, xm, function(x, y) dnorm(x, y, bx)) b <- outer(xs, x[ind], function(x, y) dnorm(x, y, bx)) result1 <- (rowSums(a) - a)/(rowSums(b) - b) } , when I run it the results of a and b are equal with shouldnot be. – Nate Lee Apr 19 '22 at 00:09
  • My guess is the error is upstream of the `a`, `b` calculations. The only way for `a` to be the same as `b` is if `x[ind]` is the same as `xm`. Do you get something different with the double `for` loop? – jblood94 Apr 19 '22 at 01:02
  • yes I get different – Nate Lee Apr 20 '22 at 19:01
  • Can you post the code that gave you a different result between the `outer` solution and the double `for` loop? When I run the code you posted along side my code, I always get the same result. I change `N`, `xm`, `x`, `ind`, `k`, `xs`, and `bx`. – jblood94 Apr 20 '22 at 21:18