0

I am getting an error of cannot allocate vector of size. I searched lots of methods online including memory.limit(size=) gc(), and it kept getting the same error. No matter how large that I try the memory.limit(size=), but it still kept getting the cannot allocate vector of size 3.1Gb. But my laptop's RAM is 16Gb, and I checked the R, the R version is 64 bit. I was trying to analyzing a huge dataset, but I can't get the result with this error.

I was trying to run the below codes:

coxph(Surv(time, survival)~treatment + tt(treatment),data=lala, 
          weights=ps_weight_adj,tt=function(x,t,...)x*log(t))

And my data have 195,442 more rows, and 13 more variables:most of data are dbl, and the treatment that I used in this code is chr.

I have checked the task manager, I don't think other programs take lots of memory, but no matter how many program that I closed, the error will always said that cannot allocate vector of size 3.1Gb

Could anyone help me? Thank you!

NewRUser
  • 1
  • 2
  • 2
    Unfortunately we can't possibly answer this question without more detail. The "cannot allocate vector of size ..." means there is not enough **remaining** memory to create the object you want; if you are analyzing a big data set, you may already have several large objects in your workspace that are consuming most of your memory ... – Ben Bolker Aug 01 '22 at 14:42
  • and as a follow up to @BenBolker 's comment. What kind of size is the error giving? if we are talking about Gb's then you should know R is limited in how large 1 dedicated place in the memory can be. if the size it is giving is small then it is an indecation your memory is already full with objects (or other programs). how 'huge' is the dataset? and what are you trying to do? – Omniswitcher Aug 01 '22 at 14:50
  • 1
    as a very rough rule of thumb, you usually need about 4-5 times about as much memory available as the size of the data set you want to work with, so if you are trying to work with a 12Gb data set in 16Gb of RAM you may be in trouble ... you may want to check https://cran.r-project.org/web/views/HighPerformanceComputing.html – Ben Bolker Aug 01 '22 at 14:54
  • @Ben Bolker @Omniswitcher The data is 17.1 Mb with more than 10,000 observation, and I was trying to use `coxph` to analyze the data. The code is like follows: `coxph(Surv(time, survival)~treatment + tt(treatment),data=lala, weights=ps_weight_adj,tt=function(x,t,...)x*log(t))`. And I don't think there are too many large objects in my workspace because I barely use R. Do you have any ideas? In this case, my data is not that large as you thought, but I kept getting `cannot allocate vector of size 3.1Gb` – NewRUser Aug 01 '22 at 15:08
  • 1
    Open up your computer's memory monitor (Task Manager if you're on Windows) and see if there's anything eating up a lot of memory that you can close. See if you can run the model using, say, half your data. If you're still having trouble, edit your question to include the code that you are trying to run along with a description of your data (how many rows, how many columns, maybe what are the classes of the columns? Do you have categorical variables with many unique values?) – Gregor Thomas Aug 01 '22 at 15:25
  • A [mcve] may not be feasible, but can you edit your question to include semi-complete code? It's possible (I really don't know) that using the time-transform (`tt`) feature could radically expand the amount of memory necessary ... ?? – Ben Bolker Aug 01 '22 at 15:29
  • @Gregor Thomas I have already edited the question including the code and stuff. I am really desperate, I don't know what's going on here. – NewRUser Aug 01 '22 at 15:42
  • 2
    Have you tried fitting with subsets of your data as @GregorThomas suggested? I'm sorry you feel desperate, but that doesn't help us help you ... – Ben Bolker Aug 01 '22 at 15:48
  • 1
    Also: are you sure it makes sense to use what looks like a factor variable (`treatment`) in a time-transform context? The [vignette on time-dependent analysis](https://cran.r-project.org/web/packages/survival/vignettes/timedep.pdf) uses a numeric covariate here ... – Ben Bolker Aug 01 '22 at 15:53
  • @Ben Bolker I tried fitting the half of my data, and it seems work, but it has error like you said the `treatment` is not a numeric covariate, so I tried another code `tt=function(x,t,...)model.matrix(~x)[,2:4]*log(t)` by treating it as categorical variable with using first value as reference, at least half data worked. Do you have an idea to make sure the whole data work? Thank you! – NewRUser Aug 01 '22 at 18:58
  • Can you use either the memory monitor (see @GregorThomas's comment above) or the `peakRAM` package to see how much memory is required for analyses using 1/8, 1/4, and 1/2 the data? If this is an intrinsically memory-hungry operation, it may be extremely difficult to squeeze it into your existing RAM - finding a cloud (AWS, RStudio...) instance with more memory might be your best option – Ben Bolker Aug 01 '22 at 19:10
  • After I used the `peakRAM`, the memory is not enough for half dataset, and had error `Error: cannot allocate vector of size 2.0 Gb`, so I tried to make it 1/4, but it has the error that `data contains an infinite predictor`, so it seems like if I used `model.matrix(~x)[,2]` for 1/4 dataset, the code can't be ran. But i checked the memory monitor, when I close R, the memory is like 30%. But when I ran the code even for half the data, the memory will be up to 96% – NewRUser Aug 01 '22 at 19:42

0 Answers0