0

I am running a loop to upload a csv file from my local machine, convert it to a h2o data frame, then run a h2o model. I then remove the h2o data frame from my r environment and the loop continues. These data frames are massive so I can only have one data frame loaded at a time (hence the reason for me removing the data frame from my environment).

My problem is that h2o creates temporary files which quickly max out my memory. I know I can restart my r session, but is there another way to flush this out in code so my loop can run happily? When I look at my task manager the all my memory is sucked up in Java(TM) Platform SE Binary.

chipsin
  • 643
  • 1
  • 4
  • 10
  • What method are you using to remove your data frame at the moment? – Adam Quek May 27 '22 at 05:35
  • Hi Adam, I'm using `rm(h2o_df)` – chipsin May 27 '22 at 07:05
  • Some discussion [here](https://stackoverflow.com/questions/11579765/how-do-i-clean-up-r-memory-without-restarting-my-pc) and [here](https://stackoverflow.com/questions/1358003/tricks-to-manage-the-available-memory-in-an-r-session) on the OS reclaiming the RAM that is freed up by R. Would be better off restarting your session. – Adam Quek May 27 '22 at 07:27
  • 1
    Have you tried running `gc()` after `rm(h2o_df)`? That usu. works for me. – jlhoward May 27 '22 at 08:02

1 Answers1

3

Removing the object from the R session using rm(h2o_df) will eventually trigger garbage collection in R and the delete will be propagated to H2O. I don't think this is ideal, however.

The recommended way is to use h2o.rm or for your particular use case, it seems like h2o.removeAll would be the best (takes care of everything, models, data..).

Michal Kurka
  • 566
  • 2
  • 6