47

I am trying to connect R to Teradata to pull data directly into R for analysis. However, I am getting the error of,

Error in .jcall(rp, "I", "fetch", stride, block) :
  java.lang.OutOfMemoryError: Java heap space

I have tried to set up my R options to increase the max heap size of JVM by doing:

options(java.parameters = "-Xmx8g")

I have also tried to initialize java parameters with rJava function .jinit as: .jinit(parameters="-Xmx8g"). But still failed.

The calculated size of the data should be approximately 3G (actually less than 3G).

Andronicus
  • 25,419
  • 17
  • 47
  • 88
user3768354
  • 517
  • 2
  • 5
  • 5
  • Can you try using less memory to verify that it works at all? Just because the raw data is only 3GB does not preclude the possibility that the JVM needs more memory than this. – Tim Biegeleisen Jan 06 '16 at 01:08
  • 6
    You have to make sure you run `options(java.parameters = "-Xmx8g")` before starting up your Java instance. So start in a fresh R session with NO packages loaded. Run that command and THEN load all your packages and try again. You should be fine but it's possible the JVM needs a lot for other reasons. – stanekam Jan 06 '16 at 01:13
  • 1
    I guess "calculated size of the data" is the size of meaningful information stored. However data structures are not ideal in memory consumption - they have fields for internal usage, they allocate additional memory to prevent repeated allocations on data additions, so even empty data structures without any data consume some memory. So 3Gb of data can easily require more than 8Gb of operative memory. – user3707125 Jan 06 '16 at 01:53

4 Answers4

42

You need to make sure you're allocating additional memory before loading rJava or any other packages. Wipe the environment first (via rm(list = ls())), restart R/Rstudio if you must, and modify the options at the beginning of your script.

options(java.parameters = "-Xmx8000m")

See for example https://support.snowflake.net/s/article/solution-using-r-the-following-error-is-returned-javalangoutofmemoryerror-gc-overhead-limit-exceeded

tuxedopong
  • 545
  • 4
  • 5
14

I somehow had this problem in a not reproducible manner, partly solved it with -Xmx8g but run in to problems randomly.

I now found an option with a different garbage collector by using

options(java.parameters = c("-XX:+UseConcMarkSweepGC", "-Xmx8192m"))
library(xlsx)

at the beginning of the script and before any other package is loaded since other packages can load some java things by themselves and the options have to be set before any Java is loaded.

So far, the problem didn't occurred again.

Only sometimes in a long session it can still happen. But in this case a session restart normally solves the problem.

drmariod
  • 11,106
  • 16
  • 64
  • 110
2

Running the following two lines of code (before any packages are loaded) worked for me on a Mac:

options(java.parameters = c("-XX:+UseConcMarkSweepGC", "-Xmx8192m"))
gc()

This essentially combines two proposals previously posted herein: Importantly, only running the first line alone (as suggested by drmariod) did not solve the problem in my case. However, when I was additionally executing gc() just after the first line (as suggested by user2961057) the problem was solved.

Should it still not work, restart your R session, and then try (before any packages are loaded) instead options(java.parameters = "-Xmx8g") and directly after that execute gc(). Alternatively, try to further increase the RAM from "-Xmx8g" to e.g. "-Xmx16g" (provided that you have at least as much RAM).

EDIT: Further solutions: While I had to use the rJava for model estimations in R (explaining y from a large number of X), I kept receiving the above 'OutOfMemory' Errors even if I scaled up to "-Xmx60000m" (the machine I am using has 64 GB RAM). The problem was that some model specifications were simply too big (and would have required even more RAM). One solution that may help in this case is scaling the size of the problem down (e.g. by reducing the number of X's in the model), or – if possible – splitting the problem into independent pieces, estimating each separately, and putting those pieces together again.

user2B4L2
  • 969
  • 1
  • 7
  • 10
1

I added garbage collection and that solved the issue for me. I am connecting to Oracle databases using RJDBC.
simply add gc()