0

Am relatively new to both databases and Redshift, and I'd like to use the Connections within R Studio to be able to query a set of databases to extract subsets of data.

Specific Question: after connecting to the correct database in Redshift, how do I then specify sub databases and their tables using dbplyr?

The main database is production, and within that there are several sub-databases (?) such as customerdb, supplierdb, membersdb .. and each of these have several tables. Say I want to subset the data of the table companies within customerdb, I'm trying to do this by:

# `con` is the connection to Redshift database via DBI and it connects successfully
dataset <- tbl(con, "customerdb.companies")

Error in new_result(connection@ptr, statement) : 
nanodbc/nanodbc.cpp:1344: 42P01: [Amazon][Amazon Redshift] (30) 
Error occurred while trying to execute a query: [SQLState 42P01] 
ERROR:  relation "customerdb.companies" does not exist

Have looked at various resources, from AWS to dbplyr's page, as well as other SO questions and none seem to show this setup style or methods to connect to underlying tables.

daRknight
  • 253
  • 3
  • 17
  • I am not sure what you mean by "sub-databases". Have you tried `dataset <- tbl(con, "companies")`? – dmca Nov 16 '18 at 16:51
  • I wasn't sure either of the term hence the (?) .. basically the Redshift database has several databases within it and their own set of tables. Your command returns the same error in my question. I also tried to modify the connection to Redshift from `production` to say `production.customerdb` or directly to `customerdb` and it fails to connect. By this I mean the `database` value in: `dbConnect(odbc::odbc(), Driver = , server name = , database = production, ...` for example to `, database = customerdb` or `database = production.customerdb` – daRknight Nov 16 '18 at 20:17
  • A "sub-database" is actually called a _schema_. You can have a look at [this answer](https://stackoverflow.com/questions/45032368/how-to-use-dplyr-tbl-on-a-sql-server-non-standard-schema-table/45036818#45036818). – Scarabee Nov 19 '18 at 12:50
  • @Scarabee I think this is precisely it .. From your suggestion I was able to successfully execute `tbl(con, in_schema("customerdb", "companies")` and did a basic `filter()` verb on it, then saved the result to my local environment as a `data.frame`.. also thank you for highlighting the right term, schema; I will link it to the answer you provided – daRknight Nov 19 '18 at 21:19
  • Nice, glad it helped! And no need to create a new answer here, it's better to mark this question as a duplicate, as you just did :) – Scarabee Nov 19 '18 at 21:22
  • An oddity I've noticed is that the table names and field names are all truncated, and just show the first letter, so from the example the schema of `customerdb`, `supplierdb`, `membersdb`, shows in my environment under the _Connections_ tab as `c`, `s`, `m`, etc ... and the field names of `id`, `name`, .. show as `i`, `n` .. but I'm now unsure if this is a `dbplyr` thing, a `DBI` connection issue, or an RStudio issue...... – daRknight Nov 19 '18 at 21:35
  • Have you seen [this page](https://support.rstudio.com/hc/en-us/articles/115011264307-Troubleshooting-Connections)? Looks like they describe your exact problem: _Why does my database catalog only show the first letter of each table?_ – Scarabee Nov 20 '18 at 11:38
  • 1
    @Scarabee thank you yet again .. this precisely solved it, `UTF-16` v `UTF-32` – daRknight Nov 20 '18 at 19:28

0 Answers0