1

I'm trying to merge a 5 millions rows 10 col df with a 600 k row and 145 cols df on a few columns :

calculs_user <- inner_join(calculs_user,champs_diag, by=c("ID.user" = "ID.user","Date" = "Date", "Year" = "Year", "Form" = "Form"))

but the error "cannot allocate a vector of size 1.6 GB" keeps running. I'm working on a 16 GB Ram machine and 64-bit R and I tried to follow solutions in R memory management / cannot allocate vector of size n Mb , but it does not work. My memory size limit is set to 16 GB so I don't understand why 1.6 GB would be too much !

Thanks

apauron
  • 19
  • 3
  • 1
    Does this answer your question? [Increasing (or decreasing) the memory available to R processes](https://stackoverflow.com/questions/1395229/increasing-or-decreasing-the-memory-available-to-r-processes) – Adam Quek May 25 '22 at 09:06
  • Hi, thanks for your suggestion, but I already set the memory.limit() to 160 GB. This does not change the error. I think my machine is just not powerful enough. – apauron May 25 '22 at 09:09

1 Answers1

1

Have you tried using data.table while trying to free RAM as much as possible? dplyr is a great package, but it does have issues when dealing with huge datasets. In my experience, data.table performs better in this regard. It is also worth pointing out that the function gc() does not entirely free unused RAM, so you might want to restart RStudio before trying to run the code.

Here is an example of code you could try that I cannot properly test (as I do not have your data):

library(data.table)

calculs_user = as.data.table(calculs_user)
champs_diag = as.data.table(champs_diag)

calculs_user = calculs_user[champs_diag, on = c("ID.user", "Date", "Year", "Form"), nomatch = 0]

If this does not work, you ccould try using ff package, that should trick R into thinking that you use RAM while you are not, but I never used it myself.

And, of course, if everything else fails, you can divide your data into smaller chunks. It might become tedious if you need to do it a lot, though. But still, it is an option.

Darmist
  • 50
  • 9