I'd like to use foreach to run SQL queries in parallel. As described in this answer, it's possible to do this using the doParallel package to setup one database connection per worker, and then use foreach
to run queries on the workers. As noted in that answer, the one-connection, multiple query structure is more efficient than one-connection, one query (there's a bit of overhead associated with creating and destroying SQL connections, and you run the risk of overwhelming the server with connections).
I'd like to run code that works with arbitrary foreach
backends. Because connection objects can't be serialized, I can't created them on the parent process and then export them to the workers. Each worker needs to create its own database connection.
Within the foreach package, is there any way to pass a piece of "worker initialization" code that gets run once per worker, before the foreach loop is evaluated?