Moving beyond R's optim function

Question

I am trying to use R to estimate a multinomial logit model with a manual specification. I have found a few packages that allow you to estimate MNL models here or here.

I've found some other writings on "rolling" your own MLE function here. However, from my digging around - all of these functions and packages rely on the internal optim function.

In my benchmark tests, optim is the bottleneck. Using a simulated dataset with ~16000 observations and 7 parameters, R takes around 90 seconds on my machine. The equivalent model in Biogeme takes ~10 seconds. A colleague who writes his own code in Ox reports around 4 seconds for this same model.

Does anyone have experience with writing their own MLE function or can point me in the direction of something that is optimized beyond the default optim function (no pun intended)?

If anyone wants the R code to recreate the model, let me know - I'll glady provide it. I haven't provided it since it isn't directly relevant to the problem of optimizing the optim function and to preserve space...

EDIT: Thanks to everyone for your thoughts. Based on a myriad of comments below, we were able to get R in the same ballpark as Biogeme for more complicated models, and R was actually faster for several smaller / simpler models that we ran. I think the long term solution to this problem is going to involve writing a separate maximization function that relies on a fortran or C library, but am certainly open to other approaches.

Devil is in the details. You could mess with `optim` parameters (see section about `control` in the documentation). You could compare a default parameters with this used by your colleague code or by Biogeme. Are they different, if yes then why? — Marek, Sep 21 '10 at 07:17
@Marek - Biogeme relies on a custom maximization routine written in C and it's a similar story with Ox. This is a new area for me, but I'm starting to learn about the different approaches used. — Chase, Sep 23 '10 at 02:38
as far as I understood, nlm() and possibly the other optimization routines in R are written in C already. I'd rather advise you to look for access to the internal functions directly so you get rid of the overhead instead of re-inventing the wheel — Joris Meys, Sep 28 '10 at 13:56

Joris Meys · Accepted Answer · 2010-09-21T13:26:30.687

21

Tried with the nlm() function already? Don't know if it's much faster, but it does improve speed. Also check the options. optim uses a slow algorithm as the default. You can gain a > 5-fold speedup by using the Quasi-Newton algorithm (method="BFGS") instead of the default. If you're not concerned too much about the last digits, you can also set the tolerance levels higher of nlm() to gain extra speed.

f <- function(x) sum((x-1:length(x))^2)

a <- 1:5

system.time(replicate(500,
     optim(a,f)
))
   user  system elapsed 
   0.78    0.00    0.79 

system.time(replicate(500,
     optim(a,f,method="BFGS")
))
   user  system elapsed 
   0.11    0.00    0.11 

system.time(replicate(500,
     nlm(f,a)
))
   user  system elapsed 
   0.10    0.00    0.09 

system.time(replicate(500,
      nlm(f,a,steptol=1e-4,gradtol=1e-4)
))
   user  system elapsed 
   0.03    0.00    0.03

edited Sep 21 '10 at 13:26

answered Sep 21 '10 at 11:21

Joris Meys

106,551
31
221
263

3

Joris, can you explain *Don't know if it's much faster, but it does improve speed* once more? I had only two coffees today. Do you mean "faster, but unsure by how much" ? – Dirk Eddelbuettel Sep 21 '10 at 12:15
2

Errr, more coffee needed on this side of the keyboard too... Indeed, it's faster on the functions I tested it on, sometimes marginally, sometimes quite a bit. It also depends on the control settings, as illustrated in the code. – Joris Meys Sep 21 '10 at 12:35
thanks for the tip on nlm(). It ended up being the fastest after altering the step and gradient as you outlined above. – Chase Sep 23 '10 at 02:39

score 7 · Answer 2 · answered May 03 '18 at 20:06

I am the author of the R package optimParallel, which could be helpful in your case. The package provides parallel versions of the gradient-based optimization methods of optim(). The main function of the package is optimParallel(), which has the same usage and output as optim(). Using optimParallel() can significantly reduce optimization times as illustrated in the following figure (p is the number of paramters). See https://cran.r-project.org/package=optimParallel and http://arxiv.org/abs/1804.11058 for more information.

score 6 · Answer 3 · answered Sep 21 '10 at 12:17

6

Did you consider the material on the CRAN Task View for Optimization ?

answered Sep 21 '10 at 12:17

Dirk Eddelbuettel

360,940
56
644
725

Mike Dunlavey · Answer 4 · 2010-09-21T20:11:39.483

2

FWIW, I've done this in C-ish, using OPTIF9. You'd be hard-pressed to go faster than that. There are plenty of ways for something to go slower, such as by running an interpreter like R.

Added: From the comments, it's clear that OPTIF9 is used as the optimizing engine. That means that most likely the bulk of the time is spent in evaluating the objective function in R. While it is possible that C functions are being used underneath for some of the operations, there still is interpreter overhead. There is a quick way to determine which lines of code and function calls in R are responsible for most of the time, and that is to pause it with the Escape key and examine the stack. If a statement costs X% of time, it is on the stack X% of the time. You may find that there are operations that are not going to C and should be. Any speedup factor you get this way will be preserved when you find a way to parallelize the R execution.

edited Sep 21 '10 at 20:11

answered Sep 21 '10 at 12:46

Mike Dunlavey

40,059
14
91
135

Quite some of the underlying routines of R are in C or Fortran. So theoretically, R should be able to be close to C in speed. Now it's just a matter of finding that nice implementation. Don't know wheter OPTIF9 is used, but it would be a nice idea for an extra package ;-) – Joris Meys Sep 21 '10 at 13:15
The Dennis-Schnabel algorithm 'optif9' is implemented in R in nlm. (a quick look at the R source code reveals that it is a c rewrite of the old fortran subroutine). – eyjo Sep 21 '10 at 18:56
@eyjo: @Joris: That's good, and it's what I would expect. Then the bulk of the time would be spent in the user-coded objective function in R. If it were possible to translate that to a compiler language it should be in the same ballpark as C. – Mike Dunlavey Sep 21 '10 at 19:45
2

Yes, you will have that problem. As Chase said, Ox, which is more or less a customized c-compiler, does his model in 4 sec vs. the 90 sec in R. But you can't beat the flexibility of working with statistics in R, and easily extended it with your custom c calls. If you write the objective function in c with the R.h, Rinternals.h and R_ext/Applic.h headers included and call the relevant optim routine from there, compile it for R, I think you should be pretty close and it works seamlessly in R. – eyjo Sep 21 '10 at 20:29
@eyjo: Right. The objective function is going to get called 100s of times for every once of other code, because it is optif9's inner loop. So I would just put that function in a temporary infinite loop and tune the daylights out of it, if I couldn't compile it. – Mike Dunlavey Sep 21 '10 at 20:44

Moving beyond R's optim function

4 Answers4

Linked