0

I'd like to use foreach to run SQL queries in parallel. As described in this answer, it's possible to do this using the doParallel package to setup one database connection per worker, and then use foreach to run queries on the workers. As noted in that answer, the one-connection, multiple query structure is more efficient than one-connection, one query (there's a bit of overhead associated with creating and destroying SQL connections, and you run the risk of overwhelming the server with connections).

I'd like to run code that works with arbitrary foreach backends. Because connection objects can't be serialized, I can't created them on the parent process and then export them to the workers. Each worker needs to create its own database connection.

Within the foreach package, is there any way to pass a piece of "worker initialization" code that gets run once per worker, before the foreach loop is evaluated?

Community
  • 1
  • 1
Zach
  • 29,791
  • 35
  • 142
  • 201
  • I'm not sure I understand your question. The linked answer says to use `clusterEvalQ` - `clusterEvalQ evaluates a literal expression on each cluster node. It is a parallel version of evalq, and is a convenience function invoking clusterCall`. Isn't that precisely what you're asking for? – Andrie Aug 04 '14 at 15:07
  • @Andrie Yes, except it only works with `doParallel`. I'd like to find or write a function like this that I can use with arbitrary parallel backends. – Zach Aug 04 '14 at 16:35
  • In the linked answer @steveweston (author of foreach, doMC, etc.) indicates there is no backend-independent way of doing this. – Andrie Aug 05 '14 at 07:04

0 Answers0