4

I have a Python script running as a daemon. At startup, it spawns 5 processes, each of which connects to a Postgres database. Now, in order to reduce the number of DB connections (which will eventually become really large), I am trying to find a way of sharing a single connection across multiple processes. And for this purpose I am looking at the multiprocessing.sharedctypes.Value API. However, I am not sure how I can pass a psycopg2.connection object using this API across processes. Can anyone tell me how it might be done?

I'm also open to other ideas in order to solve this issue.

The reason why I did not consider passing the connection as part of the constructor to the 5 processes is mutual exclusion handling. I am not sure how I can prevent more than one process from accessing the connection if I follow this approach. Can someone tell me if this is the right thing to do?

  • 1
    It's unclear what you're looking for here. 5 connections certainly isn't an issue. Are you saying you may eventually need to spawn 100s or 1000s of processes, each with their own connection? If so, even if you could share them, they'd be bound to the connection pool, since only one process could use a given connection at any given time. – khampson Sep 27 '14 at 05:19

1 Answers1

12

You can't sanely share a DB connection across processes like that. You can sort-of share a connection between threads, but only if you make sure the connection is only used by one thread at a time. That won't work between processes because there's client-side state for the connection stored in the client's address space.

If you need large numbers of concurrent workers, but they're not using the DB all the time, you should have a group of database worker processes that handle all database access and exchange data with your other worker processes. Each database worker process has a DB connection. The other processes only talk to the database via your database workers.

Python's multiprocessing queues, fifos, etc offer appropriate messaging features for that.

Craig Ringer
  • 307,061
  • 76
  • 688
  • 778
  • Your second sentence seems in conflict with psycopg2 docs that [say](http://initd.org/psycopg/docs/usage.html#thread-and-process-safety): _many threads can access the same database either using separate sessions and creating a connection per thread or using the same connection and creating separate cursors_. Can you clarify? – max Nov 08 '16 at 08:37
  • 3
    psycopg2 has an interlock that allows multiple threads to share a connection by preventing them from using it at the same time. It's pretty useless for concurrent workers that access the database a lot since you'll land up with all but one thread waiting at any given time. But it's handy if you have lots of workers that each need to occasionally access the DB. You're quite right that the latter is an alternative to a dedicated database worker pool. – Craig Ringer Nov 09 '16 at 00:36
  • This makes sense. You also basically answered [my other question](http://stackoverflow.com/questions/40478896/reusing-database-connection-for-multiple-requests), I think. – max Nov 09 '16 at 03:58
  • What if all your workers always end their job with a DB operation, would you still use the "database worker processes"? Another solution I saw creates a new connection in the process initialization, do you think it would be more suitable? – leoschet Jun 12 '18 at 12:05