4

I have a database connection to PostgreSQL using the package RPostgreSQL. Currently I do the following:

  1. retrieve a list from my database
  2. run the list through a for loop, doing a calculation and writing the value back to the database

I am interested in parallelising this process. The obvious is to use the foreach functionality in the package of the same name. However, we need to use connection pooling: In this case I am interested if anyone knows a parallel backend which I can use to share my database connection. Here is a specific unresolved example:

foreach %dopar% + RPostgreSQL

In the above case there is no connection pooling in the registerDoMC parallel backend, with the work around to open and close the connection within each dopar worker. Looking at the registerDoSnow parallel backend from the snow package also does not give this functionality.

The alternative would be to use mclapply instead of dopar. In this case, does anyone know whether or how to share the database connection with each mclapply worker?

Community
  • 1
  • 1
Alex
  • 15,186
  • 15
  • 73
  • 127

1 Answers1

6

You can't share database connections between different workers in any of the general purpose R parallel programming packages because the workers are separate processes. However, you can create one connection per worker, and have the workers use that for each task that they execute. I discuss how to do that in an answer to the question that you cited.

Community
  • 1
  • 1
Steve Weston
  • 19,197
  • 4
  • 59
  • 75