2

I am trying to establish a connection between RStudio (on my machine) and Hive (which is setup on a different server). Here's my R code:

install.packages("RJDBC",dep=TRUE)
require(RJDBC)

drv <- JDBC(driverClass = "org.apache.hive.jdbc.HiveDriver", 
       classPath = list.files("C:/Users/37/Downloads/hive-jdbc-0.10.0.jar",
       pattern="jar$",full.names=T),
       identifier.quote="'")

Here is the error I get while executing the above commands:

Error in .jfindClass(as.character(driverClass)1) : class not found conn <- dbConnect(drv, "jdbc:hive2://65.11.23.453:10000/default", "admin", "admin")

I downloaded the jar files from here and placed them in the CLASSPATH. Please advise if am doing anything wrong and how I could get this to work.

Thanks.

o-90
  • 17,045
  • 10
  • 39
  • 63
Mothi
  • 23
  • 1
  • 1
  • 5

6 Answers6

1

If you have a cloudera, check version and download jars for that. Example CDH 5.9.1 hadoop-common-2.6.0-cdh5.9.1.jar hive-jdbc-1.1.1-standalone.jar

copy the jars into a folder into R host and execute:

 library("DBI")

 library("rJava")

 library("RJDBC")

#init of the classpath (works with hadoop 2.6 on CDH 5.4 installation)
cp = c("/home/youruser/rlibs/hadoop-common-2.6.0-cdh5.9.1.jar", "/home/youruser/rlibs/hive-jdbc-1.1.1-standalone.jar")
.jinit(classpath=cp)

#initialisation de la connexion
drv <- JDBC("org.apache.hive.jdbc.HiveDriver", /home/youruser/rlibs/hive-jdbc-1.1.1-standalone.jar", identifier.quote="`")

conn <- dbConnect(drv,"jdbc:hive2://HiveServerHostInYourCluster:10000/default;", "YourUserHive", "xxxx")

#working with the connexion
show_databases <- dbGetQuery(conn, "show databases")
show_databases
Carlos Gomez
  • 200
  • 1
  • 12
0

I tried this sample code and it worked for me:

library(RJDBC)

#Load Hive JDBC driver
hivedrv <- JDBC("org.apache.hadoop.hive.jdbc.HiveDriver",
                c(list.files("/home/amar/hadoop/hadoop",pattern="jar$",full.names=T),
                  list.files("/home/amar/hadoop/hive/lib",pattern="jar$",full.names=T)))

#Connect to Hive service
hivecon <- dbConnect(hivedrv, "jdbc:hive://ip:port/default")
query = "select * from mytable LIMIT 10"
hres <- dbGetQuery(hivecon, query)
Amar
  • 3,825
  • 1
  • 23
  • 26
  • 1
    Please could let me know where exactly can i get all the jar files required for hive connectivity ? – Mothi Oct 09 '14 at 11:29
  • the first statement works now, but getting error in db connect statement > conn <- dbConnect(drv, "jdbc:hive://54.77.162.203:10000/default") Error in .jcall(drv@jdrv, "Ljava/sql/Connection;", "connect", as.character(url)[1], : java.lang.NoClassDefFoundError: org/apache/hadoop/hive/metastore/api/MetaException – Mothi Oct 09 '14 at 11:31
  • hive is installed on a different server, i do not have it on my machine. I have R on my machine and am trying to connect to the hive which is hosted on a different server. – Mothi Oct 09 '14 at 11:33
  • That is ok, you just need the jar files. there is some particular jar file you are missing which is giving this class not found exception. Also are you loading library(RJDBC) before trying to create the connection ? – Amar Oct 09 '14 at 11:36
  • This is what i do, install.packages("RJDBC",dep=TRUE) library(RJDBC)drv <- JDBC(driverClass = "org.apache.hadoop.hive.jdbc.HiveDriver", classPath = list.files("C:/Users/370398/Downloads/",pattern="jar$",full.names=T), identifier.quote="'") conn <- dbConnect(drv, "jdbc:hive://54.77.162.203:10000/default"). Would you be able to share the jar files somehow ? or could share a path from where i can take them ? – Mothi Oct 09 '14 at 11:38
  • copy the jar files from the lib folder of serve where hive is installed. Or either download hive in your machine and copy jars from there – Amar Oct 09 '14 at 11:39
  • I have tried these options, i just dont seem to get access to the server where Hive is present, not i can install it on mine due to restricted permissions. Can i check if there is any other way at all please ? – Mothi Oct 09 '14 at 12:21
  • out of box, but can you try pinging that server and see you have access to that server at that specific port ? – Amar Oct 09 '14 at 12:32
  • I managed to download files and now have all jar files i think. But still getting the below error, conn <- dbConnect(drv, "jdbc:hive://54.77.162.203:10000/default") Error in .jcall(drv@jdrv, "Ljava/sql/Connection;", "connect", as.character(url)[1], : java.lang.NoSuchMethodError: org.apache.thrift.protocol.TProtocol.getScheme()Ljava/lang/Class; please could you help ? – Mothi Oct 09 '14 at 13:05
  • Hi, R is hung after the below messages(i guess these are only warnings), but is R establishing a connection ? > conn <- dbConnect(drv, "jdbc:hive://54.77.162.203:10000/default") SLF4J: Class path contains multiple SLF4J bindings.SLF4J: Found binding in [jar:file:/C:/Users/370398/Downloads/hive-0.12.0-bin.tar/hive-0.12.0-bin/lib/slf4j-log4j12-1.6.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]SLF4J: Found binding in [jar:file:/C:/Users/370398/Downloads/hive-0.12.0.tar/hive-0.12.0/lib/slf4j-log4j12-1.6.1.jar!/org/slf4j/impl/StaticLoggerBinder.class] – Mothi Oct 13 '14 at 06:38
  • it just says that there are multiple jars files for SLF4j. Try removing one of them and try again...this should go away – Amar Oct 13 '14 at 06:39
  • Have done that and have got rid of the warnings, however R seems hung, does this mean it is establishing connectivity with hive ? – Mothi Oct 13 '14 at 07:07
  • ideally it shouldn't hang if it is making a connection..just check if you have to that server and port – Amar Oct 13 '14 at 07:48
  • Hi - have tried all the below so far - but no luck !!! I am able to establish connectivity using RODBC and respective driver setup, but RJDBC connectivity doesnt work. > drv <- JDBC(driverClass = "org.apache.hadoop.hive.jdbc.HiveDriver", + classPath = c(list.files("D:/Mothi_16-07-14/hive_Jar_Files/",pattern="jar$",full.names=T), + list.files("D:/Mothi_16-07-14/hadoop_Jar_Files/",pattern="jar$",full.names=T))) > conn <- dbConnect(drv, "jdbc:hive2://54.77.162.203:10000/default","","") – Mothi Oct 14 '14 at 12:31
  • Error in .jcall(drv@jdrv, "Ljava/sql/Connection;", "connect", as.character(url)[1], : java.sql.SQLException: Invalid URL: jdbc:hive2://54.77.162.203:10000/default – Mothi Oct 14 '14 at 12:32
  • If i use driverClass = "org.apache.hive.jdbc.HiveDriver" then am getting Error in .jcall(drv@jdrv, "Ljava/sql/Connection;", "connect", as.character(url)[1], : java.lang.NoClassDefFoundError: Could not initialize class org.apache.hadoop.hive.conf.HiveConf$ConfVars...Please help ! – Mothi Oct 14 '14 at 12:32
0

Same error happened to me earlier when I was trying to use RJDBC to connect to Cassandra, it was solved by putting the Cassandra JDBC dependencies in your JAVA ClassPath.

See this answer:

Community
  • 1
  • 1
Madcat
  • 379
  • 2
  • 7
0

For anyone who finds this post there are a couple things you can try to fix the problem:

1.) reinstall rJava from source install.packages("rJava","http://rforge.net/",type="source")

2.) Initiate java debugger for loading and try to connect again .jclassLoader()$setDebug(1L)

3.) I've had to use both Sys.setenv(JAVA_HOME = /Path/to/java) before and utilize dyn.load('/Library/Java/JavaVirtualMachines/jdk1.8.0_121.jdk/Contents/Home/jre/lib/server/libjvm.dylib') to find the right jvm library.

4.) As stated rJava load error in RStudio/R after "upgrading" to OSX Yosemite, you can also create a link between the the libjvm.dylib to /usr/local/lib

sudo ln -f -s $(/usr/libexec/java_home)/jre/lib/server/libjvm.dylib /usr/local/lib

If all of these fail, a uninstall and install of R has also worked for me in the past.

Community
  • 1
  • 1
Jenks
  • 1,950
  • 3
  • 20
  • 27
0

This has helped me so far.

1) First check if the hive service is running, if not restart it.

sudo service hive-server2 status
sudo service hive-server2 restart

2) install rJava and RJDBCin R.

library(rJava)
library(RJDBC)

options(java.parameters = '-Xmx8g')
hadoop_jar_dirs <- c('/usr/lib/hadoop/lib',
                     '/usr/lib/hadoop',
                     '/usr/lib/hive/lib')
clpath <- c()
for (d in hadoop_jar_dirs) {
  clpath <- c(clpath, list.files(d, pattern = 'jar', full.names = TRUE))
}
.jinit(classpath = clpath)
.jaddClassPath(clpath)

hive_jdbc_jar <- '/usr/lib/hive/lib/hive-jdbc-2.1.1.jar'
hive_driver <- 'org.apache.hive.jdbc.HiveDriver'
hive_url <- 'jdbc:hive2://localhost:10000/default'
drv <- JDBC(hive_driver, hive_jdbc_jar)
conn <- dbConnect(drv, hive_url)
show_databases <- dbGetQuery(conn, "show databases")

show_databases

Make sure to give correct path to hadoop_jar_dirs, hive_jdbc_jar and hive_driver.

Vaibhav Sahu
  • 344
  • 2
  • 8
0

I wrote a package for dealing with this (and kerberos):

devtools::install_github('nfultz/hiveuberjar')

require(DBI)
con <- dbConnect(hiveuberjar::HiveUber(), url="jdbc://host:port/schema")
dbListTables(con)
dbGetQuery(con, "Select count(*) from nfultz.iris limit 10")
Neal Fultz
  • 9,282
  • 1
  • 39
  • 60