17

We have a Python application with over twenty modules, most of which are shared by several web and console applications.

I've never had a clear understanding of the best practice for establishing and managing database connection in multi module Python apps. Consider this example:

I have a module defining an object class for Users. It has many defs for creating/deleting/updating users in the database. The users.py module is imported into a) a console based utility, 2) a web.py based web application and 3) a constantly running daemon process.

Each of these three application have different life cycles. The daemon can open a connection and keep it open. The console utility connects, does work, then dies. Of course the http requests are atomic, however the web server is a daemon.

I am currently opening, using then closing a connection inside each function in the Users class. This seems the most inefficient, but it works in all examples. An alternative used as a test is to declare and open a global connection for the entire module. Another option would be to create the connection at the top application layer and pass references when instantiating classes, but this seems the worst idea to me.

I know every application architecture is different. I'm just wondering if there's a best practice, and what it would be?

Shog9
  • 156,901
  • 35
  • 231
  • 235
hikaru
  • 2,444
  • 4
  • 22
  • 29
  • Look at connection pooling as per this [question](http://stackoverflow.com/questions/98687/what-is-the-best-solution-for-database-connection-pooling-in-python) – mmmmmm Feb 14 '13 at 20:48

2 Answers2

17

The best method is to open a connection when you need to do some operations (like getting and/or updating data); manipulate the data; write it back to the database in one query (very important for performance), and then close the connection. Opening a connection is a fairly light process.

Some pitfalls for performance include

  • opening the database when you won't definitely interact with it
  • using selectors that take more data than you need (e.g., getting data about all users and filtering it in Python, instead of asking MySQL to filter out the useless data)
  • writing values that haven't changed (e.g. updating all values of a user profile, when just their email has changed)
  • having each field update the server individually (e.g., open the db, update the user email, close the db, open the db, update the user password, close the db, open th... you get the idea)

The bottom line is that it doesn't matter how many times you open the database, it's how many queries you run. If you can get your code to join related queries, you've won the battle.

Brigand
  • 84,529
  • 20
  • 165
  • 173
  • This is what I was looking for, confirms my best guess, and means I dont' have to rework anything! Thanks for the extra tips - I'm already doing all that but it was a nice checklist. – hikaru Feb 14 '13 at 21:08
4

MySQL connections are relatively fast, so this might not be a problem (i.e. you should measure). Most other databases take much more resources to create a connection.

Creating a new connection when you need one is always the safest, and is a good first choice. Some db libraries, e.g. SqlAlchemy, have connection pools built in that transparently will re-use connections for you correctly.

If you decide you want to keep a connection alive so that you can re-use it, there are a few points to be aware of:

  1. Connections that are only used for reading are easier to re-use than connections that that you've used to modify database data.

  2. When you start a transaction on a connection, be careful that nothing else can use that connection for something else while you're using it.

  3. Connections that sit around for a long time get stale and can be closed from underneath you, so if you're re-using a connection you'll need to check if it is still "alive", e.g. by sending "select 1" and verifying that you get a result.

I would personally recommend against implementing your own connection pooling algorithm. It's really hard to debug when things go wrong. Instead choose a db library that does it for you.

thebjorn
  • 26,297
  • 11
  • 96
  • 138