3

I recently switched some of the background functions of a software I am working on from using the Python threading module the multiprocessing module in order to take advantage of more CPU cores. Most of the transfer went without a hitch, but the database integration has been giving me significant issue.

Originally, I was using a single SimpleConnectionPool object from Python psycopg2 which sat as a global variable in a module called db that also handles some boilerplate database operations. From my understanding, creating a second Python process merely copies the current memory stack to the new process location. Because this causes issues with the database connections, I added a function called init() to my db module which simply re-initializes the SimpleConnectionPool and sets it to the global variable. My thinking was that if I called this init function from within a second process, it would create a new set of connections for only the pool on the secondary process. The main process would, therefore, maintain its own set of connections, separate from the second process.

However, using this method I was frequently getting the following exception:

OperationalError: SSL error: decryption failed or bad record mac

This originated directly from "state = conn.poll()" in psycopg2_patcher.py. I did a little digging and from what I can tell, the error is only thrown if both the main process and secondary process attempt to execute a query at the same time. I was thinking of just reverting back to one connection pool in the main process and using Queues to communicate queries from the secondary process to the main process for execution. This comes with a lot of headache though that I would rather avoid.

I also tried moving away from connection pools on the secondary process and used a single connection that is only established when a query needs to be executed, and then closed directly after. The same error occurred when the main process was trying to execute a query around the same time.

What do I need to do to the PGSQL server or my implementation to allow different processes to post queries simultaneously with the same credentials? I get the feeling I am going about the database connections between processes in a wholly unnecessary and convoluted way.

Evan McCoy
  • 115
  • 2
  • 8

1 Answers1

0

I have an elegant answer for this problem here. Where you can just specify the connection count and all of the async connections are handled for you. This is a modified version of the ThreadedConnectionPool.

Python Postgres psycopg2 ThreadedConnectionPool exhausted

eatmeimadanish
  • 3,809
  • 1
  • 14
  • 20
  • The original psycopg2 ThreadedConnectionPool isn't set up to handle multiple processes, only multiple threads. Did you address that issue in your modifications? If so, I will definitely try this out. Thank you! – Evan McCoy Dec 04 '18 at 21:08
  • I use GIPC with GEVENT to launch multiple webservices per CPU. They each use their own connection pools at that point which is fine. I like that I know exactly how many connections per webserver, it allows me to predict scaling correctly. – eatmeimadanish Dec 04 '18 at 22:49
  • But with GIPC you can pass data back and forth from the process pools, so theoretically you could run several async webservices and call back to a single async connection pool. – eatmeimadanish Dec 04 '18 at 22:50