3

I am running a simulation on non-linear regression where one part of this simulation I need to use the following function

Function arguments:

  • x is a design matrix of dimension N by p
  • y is a response vector of dimension N by 1
  • u is an index-vector of dimension N by 1 (usually generated from runif(N))

Here is a complete example:

ndata = 40
n     = 50
t     = 20

N     = n * t
p     = 10   # p is usually larger than 100 

res = array(NA, dim = c(N, 2*p*(n-1), ndata))

for (j in 1:ndata) {
 u  = runif(N)
 gu = cbind(2 * sin(2*pi*u), u * (1-2*u), exp(-u + 0.5), matrix(1,nrow = 1000) %x% matrix(1:7,nrow = 1))

 Mu     = rnorm(n-1)
 Lambda = rnorm(t-1)
 D      = t(cbind(matrix(-1, n-1, 1), diag(n-1))) %x% matrix(1,t,1)
 S      = matrix(1,n,1) %x% t(cbind(matrix(-1, t-1, 1), diag(t-1)))

 x      = MASS::mvrnorm(N, rep(0,p), 0.5**abs(outer(1:p,1:p,'-')))
 
 eps    = rnorm(N)
 y      = apply(x * gu, 1, sum) + D %*% Mu + S %*% Lambda + eps

 res[,,j] = Main_fn(x, y, u)
}
My_den_fn <- function(x,h) {
  idx    = 0.75 * (1 - (t/h)**2) / h
  kernal = 0.50 * (abs(idx) + idx)
  return(kernal)
}

Main_fn <- function (x, y, u) {
  
  N = dim(x)[1]
  p = dim(x)[2]
  
  h  = sd(u) * N**(- 0.2) * 2
  
  n = 50
  t = 20
  D = t(cbind(matrix(-1, n-1, 1), diag(n-1))) %x% matrix(1,t,1)
  S = matrix(1,n,1) %x% t(cbind(matrix(-1, t-1, 1), diag(t-1)))
  M = cbind(D, S)
  
  ab = array(NA, dim = c(N, 2 * p))
  
  for(i in 1:N) {
    W = diag(My_den_fn(u - u[i], h)) # My_den_fn will return values >= 0

    Q = diag(N) - M %*% solve(t(M) %*% W %*% M + 1e-3 * diag(n+t-2)) %*% crossprod(M, W) # Projection Matrix Q
    
    W_new = crossprod(Q,W %*% Q)
    
    G = cbind(x, x*(u - u[i]))
    
    ab[i,] = solve(crossprod(G,W_new)%*% G + 1e-3 * diag(2*p)) %*% (crossprod(G,W_new)%*% y)
  }
  
  return(ab)
}

Obviously this is not the most efficient code. I know that vectorizing would absolutely make things way faster, but I have 2 issues here

  • I am not sure how to vectorize this function
  • There are 2 times where I need to invert a matrix

Therefore, I tried my best to make Main_fn a bit faster without vectorizing and I got the following

Main_fn_A <- function (x, y, u) {
  
  N = dim(x)[1]
  p = dim(x)[2]
  
  h  = sd(u) * N**(- 0.2) * 2

  n = 50
  t = 20
  D = t(cbind(matrix(-1, n-1, 1), diag(n-1))) %x% matrix(1,t,1)
  S = matrix(1,n,1) %x% t(cbind(matrix(-1, t-1, 1), diag(t-1)))
  M = cbind(D, S)
  
  U = outer(u,u,'-')
  
  W     = sapply(X = 1:N, FUN = function(i){My_den_fn(U[i,],h)}, simplify = TRUE) # values >= 0
  
  Q     = lapply(X = 1:N, FUN = function(i){diag(N)-M %*% solve(crossprod(M,W[,i]*M)+1e-3*diag(n+t-2)) %*%t (M*W[,i])})
  
  W_new = lapply(X = 1:N, FUN = function(i){crossprod(Q[[i]],W[,i]*Q[[i]])})
  
  GAMMA = lapply(X = 1:N, FUN = function(i){cbind(x, x*U[,i])})
  
  
  ab = array(NA, dim = c(N, 2 * p))
  for(i in 1:N) {
    
    ab[i,] = solve(crossprod(GAMMA[[i]],W_new[[i]]) %*% GAMMA[[i]] + 1e-3 * diag(2*p)) %*% (crossprod(GAMMA[[i]],W_new[[i]]) %*% y)
  }
  
  return(ab)
}

Unfortunately, the code is still very slow with moderate sample sizes (i.e. n = 1000, p = 100) and it takes roughly 12hrs to run 100 replicates of the same size. I read that doing matrix operation in cpp is much faster. Unfortunately, I have no experience with other programming languages except R. Is there any way I could speed up the process? Also, I know the code seems messy, so if you have an idea to make this code cleaner, please share.

Your help is very much appreciated

MOHAMMED
  • 400
  • 1
  • 9
  • 1
    What is your code doing? – Onyambu Sep 18 '22 at 22:09
  • 1
    Basically it estimates the local kernel around each point in the vector ‘u’ which has been fed to the function – MOHAMMED Sep 18 '22 at 22:34
  • You might add an example how you use the function. – jay.sf Sep 18 '22 at 22:48
  • Is thyere no optimized function somewhere that could do that? If not give an example of data and what the results would be. Give an explanation of what happens to each datapoint to get thye expected results – Onyambu Sep 18 '22 at 22:51
  • 4
    Notice, that if the code already works you might better ask this on [Code Review](https://codereview.stackexchange.com/help/on-topic) (also with usage example of course). – jay.sf Sep 18 '22 at 22:56
  • @jay.sf I just added an illustration on how to use the functions. – MOHAMMED Sep 18 '22 at 23:48
  • 3
    Profile your code. Figure out where the slow parts are. Focus on improving the speed of the slowest parts. – Gregor Thomas Sep 19 '22 at 01:23
  • Better fit on https://scicomp.stackexchange.com/ IMO. If you agree, flag your own question to request a bounty refund and a migration. – David Eisenstat Sep 26 '22 at 00:59
  • @DavidEisenstat How can I flag my own question and request a bounty refund? – MOHAMMED Sep 26 '22 at 02:04
  • 3
    [This answer](https://stackoverflow.com/questions/64501106/fastest-way-in-r-to-compute-the-inverse-for-large-matrices) might be relevant. Which might explain why the solution proposed [here](https://wudongjie.github.io/speed-up-matrix-inversion-in-R/) achieves a 10 fold speed increase on matrix inversion. – Waldi Sep 26 '22 at 05:55

0 Answers0