11

I'm trying to connect to Amazon Athena via JDBC and pool:

What has worked so far:

library(RJDBC)
library(DBI)
library(pool)
library(dplyr)
library(dbplyr)

drv <- RJDBC::JDBC('com.amazonaws.athena.jdbc.AthenaDriver', '/opt/jdbc/AthenaJDBC41-1.1.0.jar')

pool_instance <- dbPool(
  drv = drv,
  url = "jdbc:awsathena://athena.us-west-2.amazonaws.com:443/",
  user = "me",
  s3_staging_dir = "s3://somedir",
  password = "pwd"
)

mydata <- DBI::dbGetQuery(pool_instance, "SELECT * 
                                          FROM myDB.myTable
                                          LIMIT 10")

mydata

---> Works fine. Correct data is beeing returned.

That does not work:

pool_instance %>% tbl("myDB.myTable") %>% head(10)
# Error in .verify.JDBC.result(r, "Unable to retrieve JDBC result set for ",  : 
#   Unable to retrieve JDBC result set for SELECT *
# FROM "myDB.myTable" AS "zzz2"
# WHERE (0 = 1) ( Table myDB.myTable not found. Please check your query.)

The problem here is that Athena expects the following syntax as SQL:

Either:

SELECT *
FROM "myDB"."myTable"

Or:

SELECT *
FROM myDB.myTable

So basically, by passing the string "myDB.myTable":

pool_instance %>% tbl("myDB.myTable") %>% head(10)

The following syntax is being used:

SELECT *
FROM "myDB.myTable"

which results in the following error since such table doesn't exist:

# Error in .verify.JDBC.result(r, "Unable to retrieve JDBC result set for ",  : 
#  Unable to retrieve JDBC result set for SELECT *
# FROM "myDB.myTable" AS "zzz6"
# WHERE (0 = 1) ( Table myDB.myTable not found. Please check your query.)

What I have tried:

So therefore I have tried to pass either "myDB"."myTable" or myDB.myTable to tbl() unsuccessfully:

I have tried to use capture.output(cat('\"myDB\".\"myTable\"')):

pool_instance %>% tbl(capture.output(cat('\"myDB\".\"myTable\"'))) %>% head(10)
# Error in .verify.JDBC.result(r, "Unable to retrieve JDBC result set for ",  : 
#   Unable to retrieve JDBC result set for SELECT *
# FROM """myDB"".""myTable""" AS "zzz4"
# WHERE (0 = 1) ( Table ""myDB"".""myTable"" not found. Please check your query.)

pool_instance %>% tbl(noquote("myDB"."myTable") %>% head(10)
# Error in UseMethod("as.sql") : 
#   no applicable method for 'as.sql' applied to an object of class "noquote"
Scarabee
  • 5,437
  • 5
  • 29
  • 55
Googme
  • 914
  • 7
  • 27

1 Answers1

4

You can use dbplyr::in_schema:

pool_instance %>% tbl(in_schema("myDB", "myTable")) %>% head(10)
Scarabee
  • 5,437
  • 5
  • 29
  • 55
  • Hi @Scarabee, thank for your answer. Did your ever tried to use `dbListFields(pool, tbl)` with in_schema to return fields like `dbListFields(pool, tbl(in_schema("myDB", "myTable")))` from a JDBC connection as the one mentioned above? – Googme Jun 25 '18 at 19:22
  • @Googme: No I've never tried, and I don't have the necessary environment to test that now. Sorry! – Scarabee Jun 26 '18 at 11:16