You are performing a large number of independent integrations. You can speed things up by performing these integrations on separate cores simultaneously (if you have a multicore processor available). The problem is that R performs its calculations in a single threaded manner by default. However, there are a number packages available that allow multithreading support. I have recently answered a few similar questions here and here, with some additional info regarding the relevant packages and functions.
Additionally, As @Mike Dunlavey already mentioned, you should avoid performing the integrations for values of t
and d
that do not match your criteria. (You are currently performing unneeded function evaluations for these values and then you overwrite the result with 0 afterwards).
I have added a possible improvement below. Note that you will have to create a separate file with your function G
included in order to evaluate it on the cluster nodes. In the code below it is assumed that this file is called functionG.R
The snippet:
library(doParallel)
F4 <- function(t,d) {
results = vector(mode="numeric",max(length=length(t),length(d))) # Zero vector
logicalVector <- ((d > 0) & (t > TMin))
relevantT <- t[logicalVector]
relevantD <- d[logicalVector] # when d is single element, NA values created
if(length(relevantT) > 1 | length(relevantD) > 1)
{
if(length(d)==1) # d is only one element instead of vector --> replicate it
relevantD <- rep(d,length(relevantT))
if(length(t)==1) # t is only one element instead of vector --> replicate it
relevantT <- rep(t,length(relevantD))
cl <- makeCluster(detectCores());
registerDoParallel(cl)
clusterEvalQ(cl,eval(parse("functionG.R")))
integrationResults <- foreach(i=1:length(relevantT),.combine="c") %dopar%
{
integrate(G,lower=0,upper=relevantT[i],relevantT[i],relevantD[i])$value;
}
stopCluster(cl)
results[logicalVector] <- integrationResults
}
else if(length(relevantT==1)) # Cluster overhead not needd
{
results[logicalVector] = integrate(G,lower=0,upper=relevantT,relevantT,relevantD)$value;
}
return(results)
}
My CPU contains 6 physical cores with hyperthreading enabled (x2). These are the results:
> t = -5000:20000
> d = -5000:20000
>
> start = Sys.time()
> testF3 = F3(t,d)
> timeNeededF3 = Sys.time()-start
>
> start = Sys.time()
> testF4 = F4(t,d)
> timeNeededF4 = Sys.time()-start;
> timeNeededF3
Time difference of 3.452825 mins
> timeNeededF4
Time difference of 29.52558 secs
> identical(testF3,testF4)
[1] TRUE
It seems that the cores are constantly in use while running this code. However, you can potentially optimize this code further by presplitting the data more efficiently around the cores and then subsequently utilize an apply type functions on the separate cores.
If more optimization is required you could also take a deeper look at the integrate
function. You can potentially play around with the settings and obtain a performance gain by allowing a less strict numerical approximation. As an alternative you could implement your own simple version of adaptive Simpson quadrature and play around with the discrete stepsizes. Most likely you could obtain massive performance increases like this (if you are able/willing to allow more error in the approximation).
EDIT:
Updated code in order for it to work in all scenario's: d
and/or t
valid/invalid numbers or vectors
REPLY TO COMMENTS
@mawir: you are correct. ifelse(test, yes, no)
will return corresponding yes
values for the rows of which test evaluates to TRUE
, it will return the respective no
values for the rows for which test
evaluate to FALSE
. However, it WILL first have to evaluate your yes
expression in order to create the yes
vector of length(test)
. This piece of code demonstrates this:
> t = -5000:5
> d = -5000:5
>
> start = Sys.time()
> testF1 = F(t,d)
> timeNeededF1 = Sys.time()-start
> timeNeededF1
Time difference of 43.31346 secs
>
> start = Sys.time()
> testF4 = F4(t,d)
> timeNeededF4 = Sys.time()-start
> timeNeededF4
Time difference of 2.284134 secs
Only the last 5 values of t
and d
are relevant in this scenario.
However, inside the F1
function the ifelse
evaluates the mapply
on all d
and t
values first in order to create the yes
vector. This is why the function execution takes so long. Next, it selects the elements for which the conditions are met, or 0 otherwise. The F4
function works around this issue.
Futhermore, you say that you obtain speedup in the scenario where t
and d
are non-vectors. However, in this scenario no parallelisation is used. You should normally obtain the maximum speedup in the senario's where one or both of t
/d
are vectors.
ANOTHER EDIT, in response to Roland's comment:
You could potentially replace clusterEvalQ(cl,eval(parse("functionG.R")))
with clusterExport(cl,"G")
if you prefer not to create a separate function file(s).