0

I want to parallelize my data writing process. I am writing a data frame to Oracle Database. This data has 4 million rows and 8 columns. It takes 6.5 hours without parallelizing.

When I try to go parallel, I get the error

Error in checkForRemoteErrors(val) : 
  7 nodes produced errors; first error: No running JVM detected. Maybe .jinit() would help.

I know this error. I can solve it when I work with single cluster. But I do not know how to tell other clusters the location of Java. Here is my code

Sys.setenv(JAVA_HOME='C:/Program Files/Java/jre1.8.0_181') 
library(rJava)
library(RJDBC)
library(DBI)
library(compiler)
library(dplyr)
library(data.table)

jdbcDriver =JDBC("oracle.jdbc.OracleDriver",classPath="C:/Program Files/directory/ojdbc6.jar", identifier.quote = "\"") 
jdbcConnection =dbConnect(jdbcDriver, "jdbc:oracle:thin:@//XXXXX", "YYYYY", "ZZZZZ")

By using Sys.setenv(JAVA_HOME='C:/Program Files/Java/jre1.8.0_181') I solve the same problem for single core. But when I go parallel

library(parallel)
no_cores <- detectCores() - 1
cl <- makeCluster(no_cores)
clusterExport(cl, varlist = list("jdbcConnection", "brand3.merge.u"))
clusterEvalQ(cl, .libPaths("C:/Users/onur.boyar/Documents/R/win-library/3.5"))
clusterEvalQ(cl, library(RJDBC))
clusterEvalQ(cl, library(rJava))

parLapply(cl, 1:length(brand3.merge.u$CELL_PH_NUM), function(x) dbSendUpdate(jdbcConnection, "INSERT INTO xxnvdw.an_cust_analytics  VALUES(?,?,?,?,?,?,?,?)", brand3.merge.u[x, 1], brand3.merge.u[x,2], brand3.merge.u[x,3],brand3.merge.u[x,4],brand3.merge.u[x,5],brand3.merge.u[x,6],brand3.merge.u[x,7],brand3.merge.u[x,8]))

#brand3.merge.u is my data frame that I try to write.  

I get the above error and I do not know how to set my Java location for other nodes.

I want to use parLapply since it is faster than foreach. Any help would be appreciated. Thanks!

boyaronur
  • 521
  • 6
  • 18

1 Answers1

1

JAVA_HOME environment variable

If the problem really is with the location of Java, you could set the environment variable in your .Renviron file. It is likely located in ~/.Renviron. Add a line to that file and this will be propagated to all R session that run via your user:

JAVA_HOME='C:/Program Files/Java/jre1.8.0_181'

Alternatively, you can just add that location to your PATH environment variable.

JVM Initialization via rJava

On the other hand the error message may point to just a JVM not being initialized, which you can solve with .jinit, a minimal example:

library(parallel)
cl <- makeCluster(detectCores())
parallel::parLapply(cl, 1:5, function(x) {
  rJava::.jinit()
  rJava::.jnew(class = "java/lang/Integer", x)$toString()
})

Working around Java use

This was not specifically asked, but you can also work around the need for Java dependency using ODBC drivers, which for Oracle should be accessible here:

con <- DBI::dbConnect(
  odbc::odbc(),
  Driver = "[your driver's name]",
  ...
)
Jozef
  • 2,617
  • 14
  • 19
  • Hello Jozef, thank you very much for your help. I am using your second solution. I am getting different error right now. The error is Error in checkForRemoteErrors(val) : 7 nodes produced errors; first error: RcallMethod: attempt to call a method of a NULL object. But I have a question. What exactly .jnew(class = "java/lang/Integer", x)$toString() do? I checked the documentation but why do we use "java/lang/Integer" ? Thanks – boyaronur Jan 15 '19 at 08:08
  • Hi, sorry for the confusion, the example with `.jnew` was just a general minimal example to show the `.jinit` and return something, not related to your issue. To answer the next error I would suggest you make a new question with a somewhat reproducible example. – Jozef Jan 15 '19 at 08:40
  • Okey. my final code is parLapply(cl, 1:length(brand3.merge.u$CELL_PH_NUM), function(x){ rJava::.jinit() dbSendUpdate(....)} ) I included rJava::.jinit() in my parLapply as you suggested. Is it the correct approach? – boyaronur Jan 15 '19 at 08:50