1

I'd like to run a model on RStudio Server, but I'm getting this error.

Error: cannot allocate vector of size 57.8 Gb

This is what my data looks like and it has 10,000 rows.

   latitude longitude                 close_date close_price
1  1.501986  86.35068 2014-08-16 22:25:31.925431   1302246.3
2 36.367095 -98.66428 2014-08-05 06:34:00.165876    147504.5
3 36.599284 -97.92470 2014-08-12 23:48:00.887510    137400.6
4 67.994791  64.68859 2014-08-17 05:27:01.404296    -14112.0

This is my model.

library(caret)
training.samples <- data$close_price %>%
  createDataPartition(p = 0.8, list = FALSE)
train.data  <- data[training.samples, ]
test.data <- datatraining.samples, ]

model <- train(
  close_price~., data = train.data, method = "knn",
  trControl = trainControl("cv", number = 1),
  preProcess = c("center","scale"),
  tuneLength = 1
)

My EC2 instance has more than 57 GB available. This is the memory.

             total       used       free     shared    buffers     cached
Mem:      65951628     830424   65121204         64      23908     215484
-/+ buffers/cache:     591032   65360596
Swap:            0          0          0

And it has enough storage space, too. This is the hard drive space.

Filesystem     1K-blocks    Used Available Use% Mounted on
devtmpfs        32965196      64  32965132   1% /dev
tmpfs           32975812       0  32975812   0% /dev/shm
/dev/xvda1     103079180 6135168  96843764   6% /

And these are details on the machine.

R version 3.5.3 (2019-03-11)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Amazon Linux AMI 2018.03
goollan
  • 765
  • 8
  • 19
  • Some basics: when you create `train.data` and `test.data`, and you still have `training.samples` around, you're storing 2 copies of your data in memory. Perhaps `rm(training.samples)` to get rid of one copy. But, your data is pretty small, that won't help *too* much. (Though do make sure you don't have other large unneeded objects floating around.) The marked duplicate has lots of general advice and info for this issue. – Gregor Thomas May 18 '19 at 21:27

1 Answers1

2

Because there's always a temporary value "*tmp*" as well as a final value you need about 2 to 3 times the projected object size to do anything useful with it. (The link talks about subset assignment but it also applies any use of the <- function.) Furthermore, to assign a new value to an object name there must be contiguous memory available. So even the supposedly "available" memory may not be contiguous. You either need to buy more memory space or reduce the size of your model. Calculations are all done in RAM or RAM equivalent. There's not usually any disk-swapping unless your OS provides virtual memory.

IRTFM
  • 258,963
  • 21
  • 364
  • 487