R/h2o error : java.lang.OutOfMemoryError: GC overhead limit exceeded

Question

I am currently reading in fairly large data sets into spark for parsing (one data frame equals over 1 million rows). To effectively utilize the h2o.gbm() model I am the concatenating multiple data frames together to create a larger training set. When I run the following code:

       training2 <- as_h2o_frame(sc, Training, strict_version_check = FALSE)
       Error: java.lang.OutOfMemoryError: GC overhead limit exceeded

I have tried to give java more memory by running the following command:

       options(java.parameters = "-Xmx100G")

I am currently running a 32 core vm with 460gb of memory with spark version 2.0.2, rsparkling 2.0.10 and h2o 3.10.5.1. The issue does in fact dissipate when I run my code on smaller data sets as well. Any ideas or insight into this issue would be greatly appreciated.

Your only possibility in that case is, to keep lesser things safed in your memory. I don't really know if it's possible but you'd have to use something like streams to keep the data below max-memory. Also check that you are using a 64bit Java to be able to use more than 2GB of ram. — Christian, Jul 20 '17 at 14:37
Note one *possible* solution you have not tried is to disable the GC Overhead check; see the linked Q&A. But beware that if the real reason for the problem is that `-Xmx100G` is not enough, then disabling the GC Overhead check will result in your application taking a very long time to die. — Stephen C, Jul 20 '17 at 14:51
@Christian - if he wasn't using a 64bit JVM then `-Xmx100G` would fail on startup. — Stephen C, Jul 20 '17 at 14:55
Try writing the dataset out to a file and parsing it again with h2o.importFile(). You didn't say how many columns you have (which matters a lot), but in general 1M rows isn't considered large for H2O. — TomKraljevic, Jul 26 '17 at 06:05

R/h2o error : java.lang.OutOfMemoryError: GC overhead limit exceeded

0 Answers0