3

I'm using glmnet and glmnetcr to fit ordinal regression models.

Unfortunately, my model matrix is ~640000 * 5000. This is larger than can be stored in a 32bit integer and I'm running into the same problem others have described: R vector size limit: "long vectors (argument 5) are not supported in .C"

If I only use half of the data, I can run this on my local server with plenty of memory and have no problems.

I've attempted to implement the 'solution' in the above post by using the dotCall64 package. I've replaced the .Fortran calls with .C64 and specified the data type for each variable. However, each time I run my code I either get nonsensical lambda values (9.9e35) or segfaults such as:

* caught segfault * address 0x1511aaeb0, cause 'memory not mapped'

Which one I get and the exact address varies each time so I assume I'm doing something wrong in implementing this solution.

Here is the code so far in the function lognet() (the function ultimately called by glmnetcr and glmnet and passes the variable to the fortran code)

Original code in lognet()

.Fortran("lognet", parm = alpha, nobs, nvars, nc, as.double(x), 
        y, offset, jd, vp, cl, ne, nx, nlam, flmin, ulam, thresh, 
        isd, intr, maxit, kopt, lmu = integer(1), a0 = double(nlam * 
            nc), ca = double(nx * nlam * nc), ia = integer(nx), 
        nin = integer(nlam), nulldev = double(1), dev = double(nlam), 
        alm = double(nlam), nlp = integer(1), jerr = integer(1), 
        PACKAGE = "glmnet")

Modified code in lognet()

.C64("lognet", SIGNATURE = c("double","int",   "int",   "int",   "int64",                         
                             "double","double","int",   "double","double"
                             "int",   "int",   "int",   "double","double",
                             "double","int",   "int",   "int",   "int",
                             "int",   "double","double","int",   "int",
                             "double","double","double","int",    "int"),
                parm = alpha, nobs, nvars, nc, as.double(x), 
                y, offset, jd, vp, cl, ne, nx, nlam, flmin, ulam, thresh, 
                isd, intr, maxit, kopt, lmu = integer(1), a0 = double(nlam * nc), ca = double(nx * nlam * nc), ia = integer(nx), 
                nin = integer(nlam), nulldev = double(1), dev = double(nlam), 
                alm = double(nlam), nlp = integer(1), jerr = integer(1), 
                PACKAGE = "glmnet")

Toy example (data much smaller than actual)

library(glmnetcr)
library(dotCall64)

x1 <- cbind(c(0,0,0,0,0,0,0,0,1,1,1,1,1,1,1,1),c(0,0,0,1,0,1,1,1,0,0,0,0,0,1,1,1),c(0,0,1,0,1,0,1,1,0,0,0,0,1,0,1,1),c(0,1,0,0,1,1,0,1,0,0,0,0,1,1,0,1),c(0,0,0,0,0,0,1,1,0,0,0,0,0,0,1,1),c(0,0,0,0,0,1,0,1,0,0,0,0,0,1,0,1),c(0,0,0,0,1,0,0,1,0,0,0,0,1,0,0,1))
y1 <- c(0,0,0,1,1,1,2,2,0,1,0,1,1,2,1,2)

testA  <- glmnetcr(x=x1,y=y1,method = "forward", nlambda=10,lambda.min.ratio=0.001, alpha =1,maxit = 500,standardize=FALSE)

Running this with the original lognet() code produces no problems. Running it with the modified lognet() code causes odd lambda value estimates and/or segfaults (seems to be random which one happens). My first guess is that I have one of the variables typed incorrectly, but I've went through everything twice and can't see the problem. The other option is that the underlying fortran code can't handle 64bit integers. I know zero fortran and am not even sure how to begin fixing the problem if this is the case.

  • 1
    I would guess that correspondence with the package authors might be more likely to bear fruit. – IRTFM Feb 08 '19 at 20:43
  • Did you modify the compiled code to use long integer vectors for indexing? An example of how to use dotCall64 is given in https://doi.org/10.1016/j.softx.2018.06.002. – Nairolf Feb 27 '19 at 00:53
  • For more detailed support, please provide a **minimal** reproducible example, maybe derived from the example in the paper mentioned above. – Nairolf Feb 27 '19 at 00:57

1 Answers1

2

So I reached out to the package maintainers of glmnet. They've had experience with the conversion to .C64. With their help and a little fiddling, I was able to get the following code to work. To run this, I made a new function called glmnet64 that called another new function lognet64 instead of the original lognet call. lognet64 was the same as the original lognet function, but replaced the .Fortran call with the following:

.C64("lognet", SIGNATURE =   c("double", "integer","integer","integer","double",
                               "double", "double", "integer","double", "double",
                               "integer","integer","integer","double", "double",
                               "double", "integer","integer","integer","integer",
                               "integer","double", "double", "integer","integer",
                               "double", "double", "double","integer","integer"),
          parm = alpha,nobs,           nvars,            nc,      as.double(x), 
          y,           offset,         jd,                  vp,      cl, 
          ne,          nx,             nlam,                flmin,   ulam, 
          thresh,      isd,            intr,                maxit,   kopt, 
          lmu = integer(1),    a0 = double(nlam * nc), 
          ca = double(nx * nlam * nc), ia = integer(nx), nin = integer(nlam), 
          nulldev = double(1), dev = double(nlam),     alm = double(nlam),          
          nlp = integer(1), jerr = integer(1), 
          INTENT = c(rep("rw",4),"r",rep("rw",15),rep("w",10)),     
          PACKAGE = "glmnet",
          NAOK = TRUE)

The key seemed to be getting all variable types specified correctly. Was able to use browser() right before the .Fortran call to get that right. Also, speed improvement by specifying INTENT and setting NAOK = TRUE (as expected). Would definitely recommend those.

  • All input R objects are converted to the type specified in `SIGNATURE`. Hence, one can write `lmu=1` instead of `lmu=integer(1)` and `1` will be converted to an integer because its signature is `"integer"`. – Nairolf Apr 16 '19 at 15:40