-1

I am trying to perform elastic net with cox regression on 120 samples with ~100k features.

I tried R with the glmnet package but R is not supporting big matrices (it seems R is not designed for 64 bit). Furthermore, the package glmnet does support sparse matrices but for whatever reason they have not implemented sparse matrix + cox regression.

I am not pushing for R but this is the only tool I found so far. Anyone knows what program I can use to calculate elastic nets + cox regression on big models? I did read that I can use Support Vector Machine but I need to calculate the model first and I cannot do that in R due to the above restriction.

Edit: A bit of clarification. I am not reporting an error in R as apparently it is normal for R to be limited by how many elements its matrix can hold (as for glmnet not supporting sparse matrix + cox I have no idea). I am not pushing for a tool but it would be easier if there is another package or a stand alone program that can perform what I am looking for.

If someone has an idea or has done this before please share your method (R, Matlab, something else).

Edit 2:

Here is what I used to test: I made a matrix of 100x100000. Added labels and tried to create the model using model.matrix.

data <- matrix(rnorm(100*100000), 100, 100000)
formula <- as.formula(class ~ .)
x = c(rep('A', 40), rep('B', 30), rep('C', 30))
y = sample(x=1:100, size=100)
class = x[y]
data <- cbind(data, class)
X <- model.matrix(formula, data)

The error I got:

Error: cannot allocate vector of size 37.3 Gb
In addition: Warning messages:
1: In terms.formula(object, data = data) :
  Reached total allocation of 12211Mb: see help(memory.size)
2: In terms.formula(object, data = data) :
  Reached total allocation of 12211Mb: see help(memory.size)
3: In terms.formula(object, data = data) :
  Reached total allocation of 12211Mb: see help(memory.size)
4: In terms.formula(object, data = data) :
  Reached total allocation of 12211Mb: see help(memory.size)

Thank you in advance! :)

Edit 3: Thanks to @marbel I was able to construct a test model that works and does not become too big. It seems my problem came from using cbind in my test.

user985611
  • 139
  • 8
  • 1
    This isn't so much a question as a PSA that you ran into an error. Unless you provide a [reproducible example](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) or specify a question, I'd vote to close it... Also note [SO's question policy](http://stackoverflow.com/help/on-topic) that *Questions asking us to recommend or find a book, tool, software library, tutorial or other off-site resource are off-topic for Stack Overflow* – alexwhitworth Feb 11 '16 at 22:47
  • What are you talking about? What error? I am asking if someone knows a tool that can perform elastic net + cox regression that is not in R (or how to make R use bigger matrices). As for your comment on tools: I do not see how that is relevant. I am struggling with performing "Elastic net + Cox regression using big models". – user985611 Feb 11 '16 at 22:49
  • Maybe if you would describe the concrete problem you had, it would be easier to help you. Just add the code you've used and the error you got form the console. – marbel Feb 11 '16 at 23:26
  • I'm not sure what you mean by "*it seems R is not designed for 64 bit*" or where you got that impression. Starting with R 3.0 has been available in 64-bit and 32-bit versions (if I recall correctly). Maybe you have an old installation or have only installed the 32-bit version? – Gregor Thomas Feb 11 '16 at 23:51
  • Hey @Gregor, yes I have the 64 bit but the matrices in R are still the same size: "R matrices can be addressed in single index notation as they are really a vector with a dim attribute of length 2 and in R vectors are addressed by a signed 32-bit integer even if you are using the 64-bit version. So a 2-column matrix can have a maximum of 2^30-1 rows." – user985611 Feb 11 '16 at 23:57
  • 1
    There's a difference between `base::matrix` and the `Matrix` package. I suggest you review the documentation for the `Matrix` package – alexwhitworth Feb 12 '16 at 00:10
  • 2
    The dataset is just 73MB. It's just a programming problem. – marbel Feb 12 '16 at 03:41

1 Answers1

1

A few pointers:

a) That's a rather small dataset, R should be more than enought. All you need is a modern computer, meaning a decent amount of RAM. I guess 4GB should be enough for such a small dataset.

The package is available in Julia and Python but I'm not sure if that model is available.

Here and here you have examples of the cox model with the GLMNET package. There is also a package called survival.

There are at least two problems with your code:

  • This is not something your would like to do in R: data <- cbind(data, class). It's just not memory efficient. If you need to do this type of operations use the data.table package. It allows to do assignment by references, check out the := operator.
  • If all your data is numeric you don't need to use model.matrix, just use data.matrix(X).
  • If you have categorical variables, use model.matrix with them only, then add them to the X matrix, perhaps using data.table, one column at a time using the ?data.table::set or the := operator.

Hopefully this can help you debug the code. Good luck!

marbel
  • 7,560
  • 6
  • 49
  • 68
  • thank you for your quick response. According to [glmnet documentation](http://www.inside-r.org/packages/cran/glmnet/docs/glmnet) it says that sparse matrices are not supported with Cox yet. My ram 12GB. I will post a code sample in 1 min. – user985611 Feb 11 '16 at 23:31
  • Really, ok. In any case, your data is small. You should be fine. – marbel Feb 12 '16 at 03:44