9

I have an API built with R plumber that connects to a PostgreSQL database using RPostgreSQL and pool (although this would also apply if I was using a Shiny app):

# create the connection pool
pool <- dbPool(
  drv = PostgreSQL(),
  host = Sys.getenv("DB_HOST"),
  port = 5432,
  dbname = "db",
  user = Sys.getenv("DB_USER"),
  password = Sys.getenv("DB_PASSWORD")
)

# start the API
pr <- plumb("plumber.R")

# on stop, close the pool
pr$registerHooks(
  list("exit" = function() { poolClose(pool) })
)

I want to import new data every day. The easiest way is to create a new database and promote it to production:

CREATE DATABASE db_new;
-- create the tables
-- bulk-insert the data
SELECT pg_terminate_backend (pid) FROM pg_stat_activity WHERE datname = 'db';
DROP DATABASE db;
ALTER DATABASE db_new RENAME TO db;

This is fast and minimizes downtime. The problem is that pool then loses is connection to the database and doesn't automatically attempt to reconnect:

> tbl(pool, "users")
Error in postgresqlExecStatement(conn, statement, ...) : 
  RS-DBI driver: (could not Retrieve the result : FATAL:  terminating connection due to administrator command
server closed the connection unexpectedly
    This probably means the server terminated abnormally
    before or while processing the request.
)

Even if I wasn't replacing the database every day, DB servers occasionally restart and that will also cause my app to break. Reconnection doesn't seem to be a feature of pool, RPostgreSQL, nor DBI. Does anyone know a way to deal with this problem?

Devin
  • 851
  • 12
  • 32
  • Slightly related: https://github.com/rstudio/pool/issues/72 (and though initiated last year, it has not seen any attention yet, unfortunately). – r2evans Mar 04 '19 at 18:17
  • Are we talking about a shiny app? – LocoGris Mar 04 '19 at 18:18
  • @JonnyCrunch, it's listed as a plumber API endpoint, which might be served along side shiny apps (e.g., in RStudio Connect) but is a little different. – r2evans Mar 04 '19 at 18:22
  • My proposal was an "observerEvent" `shiny` style which should be linked to a `reactive` checking if the connection was lost, but I do not have a clue if that apply to this kind of problem – LocoGris Mar 04 '19 at 19:01
  • Why not in your connection string connect to a different db, or public db and when you issue the terminate command it won't kickout your own session? – Soren Mar 04 '19 at 19:49
  • I think the premise of a "reconnect" is quite relevant whether this specific question could also do something else. – r2evans Mar 04 '19 at 21:03
  • Would it be an option to replace a schema instead of replacing the entire database? What does `dbIsValid()` return for the connection after renaming the database? – krlmlr Mar 12 '19 at 22:45

2 Answers2

8

I recently encountered a similar issue due to MySQL connections being closed when the instance's wait_timeout was exceeded. I came across your post on RStudio Community, and was inspired by your solution. In case you are still making use of it, and are in search of a solution that avoids the extra query while wrapping the actual functions you use, here is a reprex demonstrating something I came up with, along with an example proving it works:

library(dplyr, warn.conflicts = FALSE)
library(pool)
library(RMariaDB)

generate_safe_query <- function(pool) {
  function(db_function, ...) {
    tryCatch({
      db_function(pool, ...)
    }, error = function(e) {
      if (grepl("Lost connection to MySQL server during query", e$message)) {
        # Preserve `validationInterval` so that it can be restored
        validation_interval <- pool$validationInterval
        # Trigger destruction of dead connection
        pool$validationInterval <- 0
        refreshed_connection <- poolCheckout(pool)
        poolReturn(refreshed_connection)
        # Restore original `validationInterval`
        pool$validationInterval <- validation_interval
        # Execute the query with the new connection
        db_function(pool, ...)
      } else {
        # Unexpected error
        stop(e)
      }
    })
  }
}

mysql_pool <- dbPool(MariaDB(),
                     host = "127.0.0.1",
                     username = "root",
                     password = "",
                     dbname = "test")

safe_query <- generate_safe_query(mysql_pool)

# Works
safe_query(tbl, "notes")
#> # Source:   table<notes> [?? x 2]
#> # Database: mysql 8.0.15 [root@127.0.0.1:/test]
#>      id note 
#>   <int> <chr>
#> 1     1 NOTE1

# Set the `wait_timeout` to 5 seconds for this session
invisible(safe_query(dbExecute, "SET SESSION wait_timeout = 5"))

# Wait longer than `wait_timeout` to trigger a disconnect
Sys.sleep(6)

# Still works; warning will appear notifying that connection was
# destroyed and replaced with a new one
safe_query(tbl, "notes")
#> Warning: It wasn't possible to activate and/or validate the object. Trying
#> again with a new object.
#> # Source:   table<notes> [?? x 2]
#> # Database: mysql 8.0.15 [root@127.0.0.1:/test]
#>      id note 
#>   <int> <chr>
#> 1     1 NOTE1

safe_query(poolClose)
# Or, equivalently: 
# poolClose(mysql_pool)

Created on 2019-05-30 by the reprex package (v0.3.0)

The function returned by generate_safe_query will work with any database query function (e.g. dbExecute, dbGetQuery, etc.). Obviously, you'll want to update the error message it matches to suit your needs.

I have also opened my own Community topic on an option I think should be included in dbPool that would alleviate the need for such workarounds.

hugo-pa
  • 96
  • 1
  • 4
  • Thanks, this is great. I actually ended up rewriting the API I was working on in Python with SQLAlchemy, but I have others that could benefit from this solution. – Devin May 31 '19 at 22:08
  • Glad to hear you know others whom it could help! I also posted the solution on my Community topic that I linked to, and made it a bit more general in that it can match against more than one error message. – hugo-pa Jun 01 '19 at 18:07
0

I am using plain DBI (without pool) with below function to always have an active connection for the DBI call (e.g. DBI::dbExistsTable(rdsConnect(), "mytable")).

#' Connect returns a database connection.
#' Retrieves the connection parameters from configuration.
#' 
#' FIXME: dbIsValid is not implemented
#' https://github.com/tomoakin/RPostgreSQL/issues/76
#' workaround implemented with isPostgresqlIdCurrent()
#' @return rds allocated connection
rdsConnect <- function() {
  if (!((exists("rds") && (isPostgresqlIdCurrent(rds))))) {
    source('./config.R', local = TRUE)
    print("New PostgreSQL connection")
    rds <<- DBI::dbConnect(RPostgreSQL::PostgreSQL(),
      dbname = rds_params("rds_database"),
      host = rds_params("rds_host"),
      user = rds_params("rds_user"),
      password = rds_params("rds_password")
    )
  } else print("Valid PostgreSQL connection")
  return(rds)
}
nordic70
  • 130
  • 2
  • 9