0

I am currently trying to calculate several linear regressions with a large dataset in R (around 2 million observations) which have around 10 variables. Most of the variables are values between 0 and 100. If I try to run my regression with the function lm(), it produces the following error:

Error: vector memory exhausted (limit reached?)

After scaling my variables with this function:

rescale(df$var1, to = c(0, 1))

R is suddenly able to calculate my regressions. Do you have any idea why R is only able to calculate my coefficients when I convert my variables with rescale? Furthermore, do you know any more efficient linear models I could use? I have also tried biglm() but it caused the same error for me with the unscaled numbers.

Shawn Hemelstrand
  • 2,676
  • 4
  • 17
  • 30
ABCE
  • 1
  • That's a really tough question to answer without any details. It looks like you're fairly new to SO; welcome to the community! If you want great answers quickly, it's best to make your question reproducible. This includes sample data like the output from `dput(head(dataObject)))` and any libraries you are using. Check it out: [making R reproducible questions](https://stackoverflow.com/q/5963269). – Kat Feb 08 '22 at 13:50
  • Check if `df$var1` might be stored under the wrong type (e.g as Character or factor). `typeof(df$var1)`, you might need to convert it to a numeric type first. – Sandwichnick Feb 08 '22 at 14:06
  • @Kat thanks for the link, I definitely need to improve on that! – ABCE Feb 08 '22 at 14:27
  • @Sandwichnick it worked now after converting it with as.numeric() - thank you so much!!! – ABCE Feb 08 '22 at 14:27

0 Answers0