2

Every day, I connect to a corporate Database from within R. Currently, I have an R-script which sets up the connection and undertakes all pre-analysis data preparation (mutates and joins). I would like to include this in an R-package DWH, so I can share this process with colleagues.

Here is some sample code:

creds <- read.csv2('C:/creds.csv')

con <- ROracle::dbConnect(drv = DBI::dbDriver("Oracle"),
                               username = creds$user, password= creds$pwd,
                               dbname = 'DWH',
                               prefetch = FALSE, bulk_read = 1000L, stmt_cache = 0L,
                               external_credentials = FALSE, sysdba = FALSE)

PREST <- dplyr::tbl(con, dbplyr::in_schema('ADWH', 'PREST')) %>%
    transmute(date = A_DWH_DATE_TODAY,
              amount = A_DWH_POL_IND_COLL_AMOUNT)

Ideally, I would like to load the package with library(DWH), and then the PREST object should be available to everyone for further use, e.g.

library(DWH)
PREST %>% filter(date > 201912) 

This works perfectly well while sourcing the R-script. How can I implement this inside an R-package? I am using ROracle, but this should be the same process with any other odbc database connection.

  • This question is asking a more extensive request of how to build a package with dependencies. Read docs, tutorials, post etc. on developing R packages with dependent libraries. – Parfait Apr 30 '20 at 14:33
  • Also, using `ROracle` is not using ODBC. You can connect to Oracle with `odbc` but requires the Oracle ODBC driver. To integrate that solution, users will need to supply the dynamic connections with available additional driver installs. – Parfait Apr 30 '20 at 14:34
  • 1
    Including the connection authentication (user/pass) in a package is generally frowned upon, both for security and package maintainability. You should be able to use the `config` package or perhaps envvars to pass user/pass, much better to be provided/controller by the user(s). – r2evans Apr 30 '20 at 15:35
  • 1
    As for including `PREST` in your package, you would likely need to make it a *function* that pulls the data dynamically instead, so your users would need to use something like `PREST() %>% transmute(...)`. You'll need to do connection management within your package, as in (1) `library(DWH)` does *not* initiate a connection; (2) functions either need to open, query, and close, or all of those functions reference an internal variable that stores the connection, tests if it is still open (timeout?), reconnects if needed, etc. – r2evans Apr 30 '20 at 15:35
  • You might prefer using `pool` in addition to your `DBI` packages, since it does auto-reconnect fairly robustly (in my experience). – r2evans Apr 30 '20 at 15:36

1 Answers1

2

While I agree with @r2evans that this is not a good design decision, if you are committed to doing it this way an approach you could take is a function that assigns variables to the global environment (as per this question).

This would look something like:

create_globals = function(){
    creds <- read.csv2('C:/creds.csv')

    con <- ROracle::dbConnect(drv = DBI::dbDriver("Oracle"),
                                   username = creds$user, password= creds$pwd,
                                   dbname = 'DWH',
                                   prefetch = FALSE, bulk_read = 1000L, stmt_cache = 0L,
                                   external_credentials = FALSE, sysdba = FALSE)

    PREST <- dplyr::tbl(con, dbplyr::in_schema('ADWH', 'PREST')) %>%
        transmute(date = A_DWH_DATE_TODAY,
                  amount = A_DWH_POL_IND_COLL_AMOUNT)

    assign("creds", cred, envir = .GlobalEnv)
    assign("con", con, envir = .GlobalEnv)
    assign("PREST", PREST, envir = .GlobalEnv)
}

Then your package use looks like:

library(DWH)
create_globals()
PREST %>% filter(date > 201912) 
Simon.S.A.
  • 6,240
  • 7
  • 22
  • 41