8

I am trying to connect to HiveServer2 via JDBC drivers from R using RJDBC package. I have seen a broad explanation on passing additional arguments to dbConnect wrapper for various drivers(What arguments can I pass to dbConnect?), but there appear that situation with JDBCDriver is a bit tricker than for other drivers. I can connect to HiveServer2 under this specific URL adress url = paste0("jdbc:hive2://", host = 'tools-1.hadoop.srv', ":", port = 10000, "/loghost;auth=noSasl") . The correspoding code works and enables me to write statements on Hive from R

library(RJDBC)
dbConnect(drv = JDBC(driverClass = "org.apache.hive.jdbc.HiveDriver",
                     classPath = c("/opt/hive/lib/hive-jdbc-1.0.0-standalone.jar",
                                   "/usr/share/hadoop/share/hadoop/common/lib/commons-configuration-1.6.jar",
                                   "/usr/share/hadoop/share/hadoop/common/hadoop-common-2.4.1.jar"),
                     identifier.quote = "`"), # to juz niekoniecznie jest potrzebne
          url = paste0("jdbc:hive2://", host = 'tools-1.hadoop.srv', ":", port = 10000, "/loghost;auth=noSasl"),
          username = "mkosinski") -> conn

I am wondering if there is a way to pass arguments such as database name (loghost) or a no_authentication_mode (auth=noSasl) to ... in dbConnect such that I could only specify standard URL address (url = paste0("jdbc:hive2://", host = 'tools-1.hadoop.srv', ":", port = 10000)) and somehow pass the rest of parametrs like this

library(RJDBC)
dbConnect(drv = JDBC(driverClass = "org.apache.hive.jdbc.HiveDriver",
                     classPath = c("/opt/hive/lib/hive-jdbc-1.0.0-standalone.jar",
                                   "/usr/share/hadoop/share/hadoop/common/lib/commons-configuration-1.6.jar",
                                   "/usr/share/hadoop/share/hadoop/common/hadoop-common-2.4.1.jar"),
                     identifier.quote = "`"), # to juz niekoniecznie jest potrzebne
          url = paste0("jdbc:hive2://", host = 'tools-1.hadoop.srv', ":", port = 10000),
          username = "mkosinski", dbname = "loghost", auth = "noSasl") -> conn

But the second approach doesn't look to work, despite the various combinations of names and values of additional arguments I try.

Does anyone know how to pass additional arguments to DBI::dbConnect through ... parameter for JDBCDriver?

Community
  • 1
  • 1
Marcin
  • 7,834
  • 8
  • 52
  • 99
  • Actually the URL could get as complicated as `jdbc:hive2://:/;principal=hive/_HOST@;ssl=true;sslTrustStore=;trustStorePassword=;transportMode=http;httpPath=` with Kerberos authentication and HTTPS transport with dummy SSL certificates. And even worse when throwing High Availability based on ZooKeeper quorum (that's not a joke, we have all these in Prod...) – Samson Scharfrichter Jan 18 '16 at 22:59
  • I get it @SamsonScharfrichter :) . That's quite interesting. But what is the solution to my question? How can I pass those params? – Marcin Jan 19 '16 at 11:11
  • 1
    I've no idea if, and how, you can pass all these URL options/parameters as JDBC parameters. The SSL options are actually used by the driver so they probably have to be set on the URL; the database can be changed dynamically at any time using a *statement* (i.e. `use MY_DB`); for the rest... did you try to connect with DBVisualizer or a similar tool, then check the list of JDBC params and their default values? – Samson Scharfrichter Jan 20 '16 at 00:18
  • I don't think you pass it as `JDBC` parameters, but as `dbConnect` parameters on `JDBCDriver` . I haven't heard about DBVisualizer, but I'll google it out :) – Marcin Jan 20 '16 at 12:31
  • @SamsonScharfrichter I've found a list for supported drivers https://www.dbvis.com/doc/database-drivers/ - do you think the H2 driver correspond to the HiveDriver/HiveServer2? – Marcin Jan 20 '16 at 13:55
  • 1
    By the way I just checked on my PC and... there are no pre-defined Database Properties for Hive. Sorry, that won't help :-( – Samson Scharfrichter Jan 20 '16 at 14:14
  • 1
    *[Edit]* Nope. H2 is a fork from HypersonicSQL (HSQL). DBVis has no predefined settings for "Hive" and they don't provide a driver *(too many differences between versions and distros)*; you must drop the JARs on your PC and use [Tools][Driver Manager] menu to bind them to a custom "Hive" entry. Then you can create a Connection based on that driver (using "Database URL" settings). – Samson Scharfrichter Jan 20 '16 at 14:15
  • How have you checked that `... there are no pre-defined Database Properties for Hive` ? And you mean in HypersonicSQL or dbConnect / JDBC ? – Marcin Jan 20 '16 at 14:21
  • But there was a link to setting with hiveser2: https://github.com/cyanfr/dbvis_to_hortonworks_hiveserver2/wiki/How-I-Connected-DBVisualizer-9.2.2-on-Windows-to-Hortonwork-HiveServer2 – Marcin Jan 20 '16 at 15:03
  • In DbVisualizer 9.2.10 there were some Hive specific fixes and improvements (https://www.dbvis.com/doc/relnotes/?version=9.2.10). Please note that Hive is still not an officially supported database in DbVisualizer . – roger Jan 21 '16 at 06:46
  • try to use RJDBC. works for me. https://stackoverflow.com/questions/20059378/postgres-db-cant-connect-to-r-with-rjdbc – Lychenus Jun 16 '22 at 13:24

2 Answers2

1

According to the author's answer: https://github.com/s-u/RJDBC/issues/31#issuecomment-173934951

Simply anything - all that dbConnect does is to collect whatever you pass (including ...) and collect it all into a property dictionary (java.util.Properties) that is passed to the driver's connect() method. So any named argument you pass is included. So the only special argument is url which is passed directly, everything else is included in the properties. How that gets interpreted is out of RJDBC's hands - it's entirely up to the driver.

Marcin
  • 7,834
  • 8
  • 52
  • 99
0

there you can use the full url

library(RJDBC)
drv <- JDBC("org.postgresql.Driver","C:/R/postgresql-9.4.1211.jar")
con <- dbConnect(drv, url="jdbc:postgresql://host:port/dbname", user="<user name>", password="<password>")
Lychenus
  • 1
  • 1