4

If I don't need transactions, can I reuse the same database connection for multiple requests?

Flask documentation says:

Because database connections encapsulate a transaction, we also need to make sure that only one request at the time uses the connection.

Here's how I understand the meaning of the above sentence:

Python DB-API connection can only handle one transaction at a time; to start a new transaction, one must first commit or roll back the previous one. So if each of our requests needs its own transaction, then of course each request needs its own database connection.

Please let me know if I got it wrong.

But let's say I set autocommit mode, and handle each request in a single SQL statement. Or, alternatively, let's say I only read - not write - to the database. In either case, it seems I can just reuse the same database connection for all my requests to save the overhead of multiple connections. But I'm not sure if there's any downside to this approach.

Edit: I can see one issue with what I'm proposing: each request might be handled by a different process. Since connections should probably not be reused across processes, let me clarify my question: I mean creating one connection per process, and using it for all requests that happen to be handled by this process.

On the other hand, the whole point of (green or native) threads is usually to serve one request per thread, so my proposed approach implies sharing connection across threads. It seems one connection can be used concurrently in multiple native threads, but not in multiple green threads.

So let's say for concreteness my environment is flask + gunicorn with multiple multi-threaded sync workers.

Community
  • 1
  • 1
max
  • 49,282
  • 56
  • 208
  • 355

1 Answers1

2

Based on @Craig Ringer comment on a different question, I think I know the answer.

The only possible advantage of connection sharing is performance (other factors - like transaction encapsulation and simplicity - favor a separate connection per request). And since a connection can't be shared across processes or green threads, it only has a chance with native threads. But psycopg2 (and presumably other drivers) doesn't allow concurrent access from the same connection. So unless each request spends very little time talking to the database, there is likely a performance hit, not benefit, from connection sharing.

Community
  • 1
  • 1
max
  • 49,282
  • 56
  • 208
  • 355
  • 1
    There's one other possible case where sharing connections is desirable, and that's when you want different workers to see the same dirty (uncommitted) data, or share session state like TEMPORARY tables, WITH HOLD cursors, etc. But it's a bit of a stretch. – Craig Ringer Nov 09 '16 at 04:00
  • @CraigRinger but different workers are different processes (at least in gunicorn), and if all share the same connection (and actually use it), I thought they would collide with each other (per warnings quoted in my question). Or did you mean one could use locks to avoid that problem? – max Nov 09 '16 at 10:17
  • If they're different *processes* they cannot share connections. (OK, so if you fork without exec or pass the socket to a child process it's possible, but will only lead to chaos, it won't work in any useful way). – Craig Ringer Nov 09 '16 at 14:03
  • @CraigRinger That's precisely my use case. How would you recommend doing it? – DylanYoung Nov 14 '19 at 20:27
  • 1
    @DylanYoung If I understand you right: Run one thread whose job is talking to the database. Broker requests to it from other threads using queues, and deliver replies using queues. You might want to wrap that in a Future or similar to make it a nicer interface. Post a new question and include a link back to this question in the new question for details, someone will likely explain. – Craig Ringer Dec 06 '19 at 04:23
  • @Craig Ringer I was worried you were going to say that ;) It's in the context of trying to run Django Liver Server Unit tests with transaction-based isolation the issue being that the worker running the live server can't see the data changes. I wonder if it could be written as a custom database driver actually. I'll think on it (could possibly be contributed back to django too). Thanks for the tip! – DylanYoung Dec 07 '19 at 23:24
  • @DylanYoung Python's `psycopg2` does some locking around cursors IIRC that should let you just share one connection (or is it one cursor?) freely between multiple threads so long as you're aware of the risk of deadlocks between threads. Verify that before you rely on it. – Craig Ringer Dec 11 '19 at 03:02
  • @DylanYoung Failing that, you could write a wrapper driver rather simply, especially if you use some metaprogramming to wrap all entrypoint methods in `psycopg2` – Craig Ringer Dec 11 '19 at 03:03