7

I'm building an R package, the main purpose of which is to abstract away the pain of dealing with a proprietary database that requires some fairly complex SQL queries in order get data out.

As such, the connection to the Microsoft SQL Server (obtained by odbcDriverConnect) is a constant and important part of this package, but I can't work out how best to manage this and I'm hoping for advice as to how this should be implemented in R.

My current thoughts are:

  1. Make the user ensure they have a valid connection before they call any function. Each function then has connection as a parameter which must be passed. This puts a burden on the user.

  2. In every function, make a call to get.connection() which will get new connection each time. Old connections are then allowed to timeout naturally, which seems a sloppy approach.

  3. As above, but return the same connection each time. This appears not to be a viable proposition as I can't prevent connections from timing out through R. autoReconnect=TRUE and other tricks I've used in different languages seem to have no effect.

In Java, I would probably have a DatabaseConnectionPool populated with a number of connections and simply grab connections from, and return them to, that pool as needed. I also don't seem to have the timeout issue in Java when I specify autoReconnect=TRUE.

Any suggestions much appreciated.

Ina
  • 4,400
  • 6
  • 30
  • 44

2 Answers2

2

pool is an R package for pooling connections such as databases. If you're cool to use a github package, take a look at https://github.com/rstudio/pool. Will reuse or recreate the connection as required.

dsz
  • 4,542
  • 39
  • 35
1

It seems that a mix between the second and the third approach is a reasonable solution i.e. getting the same connection each time, however before returning the connection you can check if it is still opened, otherwise create a new connection.

It is basically as if you are manually implementing autoReconnect=TRUE

iTech
  • 18,192
  • 4
  • 57
  • 80