I'm running a number of processes using multiprocessing.Pool
Each process has to query my mysql database.
I currently connect to the database once and then share the connection between the processes
It works but occasionally I get strange errors. I've confirmed that the errors are caused when querying the database.
I figured the problem is because the same connection is used for all the processes.
- Is this correct?
As I looked for an answer I stumbled upon this q&a How to share a single MySQL database connection between multiple processes in Python
So I looked up Class pooling.MySQLConnectionPool
- http://dev.mysql.com/doc/connector-python/en/connector-python-connection-pooling.html
- http://dev.mysql.com/doc/connector-python/en/connector-python-api-mysqlconnectionpool.html
- http://dev.mysql.com/doc/connector-python/en/connector-python-api-pooledmysqlconnection.html
If I understand this. I'll set up a pool with a number of connections and share the pool between processes. Each process will then look into that pool and if a connection is available use it or else wait until a connection is freed.
- Is this correct?
But then I found this q&a Accessing a MySQL connection pool from Python multiprocessing
It seems first that "mata" confirms what I suspected but at the same time he dismisses the use of setting up a pool to be shared between processes
sharing a database connection (or connection pool) between different processes would be a bad idea (and i highly doubt it would even work correctly),
Instead he suggests
so each process using it's own connections is actually what you should aim for.
What does that mean?
- Should I create a single connection for each worker? Then what are mysql pools good for?
The example given by mata in his answer seems reasonable enough but I don't understand the passing of the entire pool as the init argument
p = Pool(initializer=init)
- Why? (As ph_singer points out in the comments this is not a good solution)
Changing the blocking Pool.map() method to Pool.map_async() and sending a connection from the pool to the map_async(q, ConnObj) should suffice?
- Is this correct?
In the comments it's mentioned that
The only way of utilizing one single pool with many processes is having one dedicated process which does all the db access communicate with it using a queue
UPDATE Found this. Seems to agree: https://stackoverflow.com/a/26072257/1267259
If you need large numbers of concurrent workers, but they're not using the DB all the time, you should have a group of database worker processes that handle all database access and exchange data with your other worker processes. Each database worker process has a DB connection. The other processes only talk to the database via your database workers.
Python's multiprocessing queues, fifos, etc offer appropriate messaging features for that.
- Is this really correct?
Isn't the purpose of a mysql pool to handle requests by the processes and relay them to a available connection?
Now I'm just confused...