1

dbDisconnect function of RMySQL library is not closing the DB connection properly. Here is the code I have tried out in my local to test it

library(RMySQL)

execute.query <- function(query){
  mydb_con <- dbConnect(MySQL(), user='username', password='pass', dbname='student', host='db_ip')
  returned_data <- dbGetQuery(mydb_con, query)
  dbDisconnect(mydb_con)
  return(returned_data)
}

for(i in c(1:100)){
  print(paste0("i = ", i));
  execute.query(paste("select now()")); 
}

which executes 100 SQL queries sequentially that should ideally create and close new DB connection for each query but after executing the above code, I could see that using this command watch netstat -nat | grep {db_ip} |wc -l the number of DB connections goes from 0 to 20 until first 20 queries, maintains it at 18-24 connections until 100th query and after 10 seconds the program finished executing, drains to 0 connections.

so my question is why dbDisconnect is not closing the DB connection? is there any nuance that I am missing to understand here?

because of this issue, I am getting Failed to connect to database: Error: Can't create TCP/IP socket (24) error in my production R code (Huge Multithreaded legacy code base) when the connection reaches 1000+

Ajithkumar_sekar
  • 631
  • 9
  • 23

2 Answers2

1

Ideally in a multi threaded environment, you should keep the dbConnect in a static block and the query execution in separate method, then dbdisconnect atlast say a equivalent of finally block in java. When you get connection exceptions, you should try to reconnect. If you open a new connection for every query in a multi threaded environment, then definitely you will have many open files simultaneously and you will get this error.

Adil Khan
  • 144
  • 9
  • I agree that I should ideally create the connection in the static block and re-use it but as I said this is a legacy code base and I kinda reluctant to change this logic. If dbDisconnect works properly, then even in a multithreaded environment it would not have more than 100 connections at a time – Ajithkumar_sekar Jan 30 '21 at 13:02
  • 1
    https://stackoverflow.com/questions/33098894/rpostgresql-cannot-close-connections. This has a good explanation of how your reference gets updated in a multi-threaded environment as well as the significance of on.exit which guarantees the db connection close. – Adil Khan Jan 30 '21 at 14:23
1

I would prefer the dbSendQuery() function, ideally from the odbc package and as the previous answer stated, just leave the connection open and then just send subsequent queries. If you want to make sure the connection is valid, then you could use dbIsValid and if it's not valid reconnect again. That should make the query execution pretty fail-safe. Caveat, code might not run directly.

library(odbc)
mydb_con <- dbConnect(MySQL(), user='username', password='pass', dbname='student', host='db_ip')

execute.query <- function(query){
  result <- odbc::dbSendQuery(mydb_con, query)
  returned_data <- odbc::dbFetch(result)
  odbc::dbClearResult(result)
  return(returned_data)
}
hannes101
  • 2,410
  • 1
  • 17
  • 40
  • thanks for your answer. I will try this approach if I couldn't find out the cause of DB connection leak btw why do you prefer odbc over RMySQL package? – Ajithkumar_sekar Jan 30 '21 at 13:07
  • 1
    I think it's faster and better maintained than DBI, but can't give you any specific facts. I am also using `build_sql` a lot from the `dbplyr` package, which is great to build SQL commands with R insertions. – hannes101 Jan 30 '21 at 18:06