11

I would like to understand what is the best practice for (re)using SQL connections to a MS SQL database through RJDBC.

I can imagine three possible scenarios:

  1. Store a connection in a global variable, initialize once, use it everywhere in the code
  2. Create a new connection for every request
  3. Do something more complicated, e.g. pre-populate a pool of open connection, and (re)use connections from the pool as needed.

I'm using my code in a shiny application with several dozens clients, and I'm afraid that something bad will happen if I use method 1. So I use method 2, creating a new connection for every request with the code below.

I can see some potential downsides of this approach: performance, taxing database resources, etc. But may be I'm too cautious since R is single-threaded, even in shiny usage scenario?

So my specific questions are:

A. Can I safely use a single connection to MS SQL database through RJDBC throughout my shiny application?

B. Are there any real downsides (memory leakage, performance, etc.) in scenario 2 above?


NewConnection <- function() {
  file = NULL
    # make it work on three different OSes - Linux, MacOS, Windows 
    for (path in c('/Users/victor/Documents/R/sqljdbc_3.0/enu/sqljdbc4.jar',
          '/home/oracle/sqljdbc_3.0/enu/sqljdbc4.jar',
          'C:/Projects/jdbc/sqljdbc_4.0/enu/sqljdbc4.jar')) {
      if (file.exists(path)) {
        file = path
          break
      }
    }
  if (is.null(file))
    return(NULL)
  else {
    drv <- JDBC("com.microsoft.sqlserver.jdbc.SQLServerDriver", file)
      passwd <- GetUserNamePassword()
      conn <- dbConnect(drv, "jdbc:sqlserver://sql.server.address.com", 
          passwd$username, passwd$password)
      return(conn)
  }
}

P.S. Related: How to manage a database connection in an R Package

Community
  • 1
  • 1
Victor K.
  • 4,054
  • 3
  • 25
  • 38
  • 3
    Go with connection pooling. It works and has been tested extensively. This is the way to scale the database connections. – Namphibian Aug 31 '13 at 23:25
  • Thanks for the tip, @Namphibian. Since I never implemented this in R (or any other language, for that matter) - any idea on how should I start? E.g. how should I test which connections are unused? How do I "return" connection to the pool once I have used it? Any tips will be welcome. – Victor K. Sep 01 '13 at 05:58
  • 1
    Well most connection pools will handle this for you transparently. Instead of getting a connection from the driver you will just ask the connection pool. Not at computer at the moment. Will respond tomorrow with a proper answer. – Namphibian Sep 01 '13 at 08:15
  • 1
    Namphibian is right. AFAIK, unless explicitly asked for in the connectionstring, every connection will by default check the connection pool (managed by the OS) and then recycle an existing connection if available. Thus your application can 'safely' create a new connection as needed (and discard it when no longer needed). Word of warning (as you're using Java), this might only be true when your application runs under Windows; I can't tell about other OS's. – deroby Sep 14 '13 at 21:43
  • @deroby I have never heard that Windows manages a connection pool by default. Can you provide a reference? – Twinkles Dec 03 '13 at 10:22
  • @Twinkles well, admittedly, OS might not be the right word indeed; 'framework' would be a better fit. E.g. [Here](http://msdn.microsoft.com/en-us/library/8xx3tyca%28v=vs.110%29.aspx) says `By default, connection pooling is enabled in ADO.NET`. [Here](http://www.connectionstrings.com/connection-pooling/) says the same but as seen from the connection-string in general.. etc. I'm sure that more googling would simply show that pretty much any (mature) 'system/framework' will have pooling on by default. Unless you're writing your own connection layer I think it's safe to assume yours will too. – deroby Dec 03 '13 at 13:16

2 Answers2

1

Many Questions:

1) Reusing a connection is faster then establishing a new connection for every use. Depending on your code, this will speedup your application a little bit. But reusing connections is more complex. Thats the reason why many people use connection pools.

2) If your program has a short runtime you can work with one connection, e.g. in a global variable. If your application is a server application (long running), than you need to maintain your connection because the server can close the connection, if he thing that nobody use it because there runs no traffic over the connection. This could happen in the night times on server applications. The connection maintenance function is part of connection pools.

Summary. If your application a simple, not multi threaded, not server application, than reuse your single connection. Otherwise, use every time a new connection or use a connection pool.

Mirko Ebert
  • 1,349
  • 1
  • 18
  • 36
1

It might help to consider what happens behind the scenes every time you establish a connection:

  • A TCP/IP connection has to be established (including DNS lookup and contacting the SQL Server Browser to get the correct port number for a named instance)
  • The user needs to be authenticated and verified to be authorized to connect
  • Server side resources for the connection (private memory etc.) have to be reserved

Therefore it makes sense to limit the amount of connections used by your application.

If your application executes all transactions in sequence you should open the connection once and reuse it. Use a connection pool for a server-based multi-user application.

Twinkles
  • 1,984
  • 1
  • 17
  • 31