I am currently reading in fairly large data sets into spark for parsing (one data frame equals over 1 million rows). To effectively utilize the h2o.gbm() model I am the concatenating multiple data frames together to create a larger training set. When I run the following code:
training2 <- as_h2o_frame(sc, Training, strict_version_check = FALSE)
Error: java.lang.OutOfMemoryError: GC overhead limit exceeded
I have tried to give java more memory by running the following command:
options(java.parameters = "-Xmx100G")
I am currently running a 32 core vm with 460gb of memory with spark version 2.0.2, rsparkling 2.0.10 and h2o 3.10.5.1. The issue does in fact dissipate when I run my code on smaller data sets as well. Any ideas or insight into this issue would be greatly appreciated.