0

Getting an error as cannot allocate vector of size 23628.3Gb while training a linear regression model with a dataset of around 1800000 observations and around 20 variables.

linear_reg_model = lm(target_variable ~., data = train_cleaned)
M--
  • 25,431
  • 8
  • 61
  • 93
Dhruv
  • 11
  • 4
  • 1
    Do you have a lot of categorical variables with many levels? It's trying to allocate nearly 3TB of RAM. That seems a bit excessive. Are you sure your data has been read in correctly? How much RAM does your computer have? – MrFlick Apr 26 '19 at 20:12
  • 2
    1.8M observations might be more than what you need to get a reliable regression. You might try taking repeated samples of a subset (say, 100k rows at a time) and see how much your results vary between them. – Jon Spring Apr 26 '19 at 20:19
  • See here for related question on memory requirements of `lm` and strategies to address: https://stackoverflow.com/questions/10326853/why-does-lm-run-out-of-memory-while-matrix-multiplication-works-fine-for-coeffic – Jon Spring Apr 26 '19 at 22:55
  • Ya, that suits me now. – Dhruv May 22 '19 at 16:59

0 Answers0