41

I have developed some custom DAO-like classes to meet some very specialized requirements for my project that is a server-side process that does not run inside any kind of framework.

The solution works great except that every time a new request is made, I open a new connection via MySQLdb.connect.

What is the best "drop in" solution to switch this over to using connection pooling in python? I am imagining something like the commons DBCP solution for Java.

The process is long running and has many threads that need to make requests, but not all at the same time... specifically they do quite a lot of work before brief bursts of writing out a chunk of their results.

Edited to add: After some more searching I found anitpool.py which looks decent, but as I'm relatively new to python I guess I just want to make sure I'm not missing a more obvious/more idiomatic/better solution.

John
  • 14,944
  • 12
  • 57
  • 57

9 Answers9

21

In MySQL?

I'd say don't bother with the connection pooling. They're often a source of trouble and with MySQL they're not going to bring you the performance advantage you're hoping for. This road may be a lot of effort to follow--politically--because there's so much best practices hand waving and textbook verbiage in this space about the advantages of connection pooling.

Connection pools are simply a bridge between the post-web era of stateless applications (e.g. HTTP protocol) and the pre-web era of stateful long-lived batch processing applications. Since connections were very expensive in pre-web databases (since no one used to care too much about how long a connection took to establish), post-web applications devised this connection pool scheme so that every hit didn't incur this huge processing overhead on the RDBMS.

Since MySQL is more of a web-era RDBMS, connections are extremely lightweight and fast. I have written many high volume web applications that don't use a connection pool at all for MySQL.

This is a complication you may benefit from doing without, so long as there isn't a political obstacle to overcome.

mbac32768
  • 11,453
  • 9
  • 34
  • 40
  • 35
    8 years after this answer was posted and pooling continues to remain relevant. If you run a webapp with heavy traffic, you can easily run into the "Too many connections" limit, regardless of its statelessness. A pool will help mitigate this by waiting for a free connection instead of hard-failing. Also, if you want to scale your app server horizontally, your database probably isn't going to live on the same machine. In this case, you most likely want to connect to it over HTTPS, which has significant overhead. A pool will help here too. – Joe Sep 13 '16 at 17:58
  • 1
    @Joe, you probably meant TLS/SSL, not HTTPS. – Stanislav Bashkyrtsev May 26 '23 at 10:13
18

Wrap your connection class.

Set a limit on how many connections you make. Return an unused connection. Intercept close to free the connection.

Update: I put something like this in dbpool.py:

import sqlalchemy.pool as pool
import MySQLdb as mysql
mysql = pool.manage(mysql)
dpoggi
  • 82
  • 5
Chris
  • 4,852
  • 1
  • 22
  • 17
  • 2
    Chris, surely someone has built this already? Worst case I can write it myself but obviously this should be a fairly common requirement for people not using existing ORMs/frameworks, and I'm sure someone else has already created a solution that has been proven over time? – John Sep 19 '08 at 01:42
  • I have done this before, with Oracle, and I think it involved less than 50 lines of code, total. Basically, use an id, dictionary, store the connection, store the use status, etc. Very simple? – Chris Sep 19 '08 at 01:56
  • 5
    @Chris, by that chain of logic, I should start implementing my hashmaps and lists by myself as well. – Martin Konecny Oct 07 '13 at 21:31
17

IMO, the "more obvious/more idiomatic/better solution" is to use an existing ORM rather than invent DAO-like classes.

It appears to me that ORM's are more popular than "raw" SQL connections. Why? Because Python is OO, and the mapping from a SQL row to an object is absolutely essential. There aren't many use cases where you deal with SQL rows that don't map to Python objects.

I think that SQLAlchemy or SQLObject (and the associated connection pooling) are the more idiomatic Pythonic solutions.

Pooling as a separate feature isn't very common because pure SQL (without object mapping) isn't very popular for the kind of complex, long-running processes that benefit from connection pooling. Yes, pure SQL is used, but it's always used in simpler or more controlled applications where pooling isn't helpful.

I think you might have two alternatives:

  1. Revise your classes to use SQLAlchemy or SQLObject. While this appears painful at first (all that work wasted), you should be able to leverage all the design and thought. It's merely an exercise in adopting a widely-used ORM and pooling solution.
  2. Roll out your own simple connection pool using the algorithm you outlined -- a simple Set or List of connections that you cycle through.
Hari
  • 1,561
  • 4
  • 17
  • 26
S.Lott
  • 384,516
  • 81
  • 508
  • 779
  • a) DAO/Repository don't go against ORM. In fact you SHOULD create such layers when you work either with ORM or with DB-API. b) DB Pools must exist outside of ORM - it's just Python community doesn't know any better. In fact, if written properly, BOTH types of projects should use them - with or without ORM. – Stanislav Bashkyrtsev Apr 28 '23 at 20:49
9

Old thread, but for general-purpose pooling (connections or any expensive object), I use something like:

def pool(ctor, limit=None):
    local_pool = multiprocessing.Queue()
    n = multiprocesing.Value('i', 0)
    @contextlib.contextmanager
    def pooled(ctor=ctor, lpool=local_pool, n=n):
        # block iff at limit
        try: i = lpool.get(limit and n.value >= limit)
        except multiprocessing.queues.Empty:
            n.value += 1
            i = ctor()
        yield i
        lpool.put(i)
    return pooled

Which constructs lazily, has an optional limit, and should generalize to any use case I can think of. Of course, this assumes that you really need the pooling of whatever resource, which you may not for many modern SQL-likes. Usage:

# in main:
my_pool = pool(lambda: do_something())
# in thread:
with my_pool() as my_obj:
    my_obj.do_something()

This does assume that whatever object ctor creates has an appropriate destructor if needed (some servers don't kill connection objects unless they are closed explicitly).

metaperture
  • 2,393
  • 1
  • 18
  • 19
  • 1
    You forgot two things: 1. `yield i` may raise exception, so you should wrap it with try...except. 2. `lpool.put(i)` may return object in wrong state (like db connection with opened transaction) – socketpair Jul 05 '16 at 09:06
  • The exception yielding should actually be handled by the context manager. No matter how the context is exited (exception or otherwise), the rest of the function will run. But yeah, if you are doing stateful manipulations on the db, handling that in the post-yield bit of the function would be a good idea. – metaperture Jul 05 '16 at 16:06
  • In practice, using the pool object in Chris's edited post is probably better, but for those looking to learn how to implement pools in general, I think this is a nice example. – metaperture Jul 05 '16 at 16:07
3

I've just been looking for the same sort of thing.

I've found pysqlpool and the sqlalchemy pool module

0xMH
  • 1,825
  • 20
  • 26
2

Replying to an old thread but the last time I checked, MySQL offers connection pooling as part of its drivers.

You can check them out at :

https://dev.mysql.com/doc/connector-python/en/connector-python-connection-pooling.html

From TFA, Assuming you want to open a connection pool explicitly (as OP had stated):

dbconfig = {  "database": "test", "user":"joe" }
cnxpool = mysql.connector.pooling.MySQLConnectionPool(pool_name = "mypool",pool_size = 3, **dbconfig)

This pool is then accessed by requesting from the pool through the get_connection() function.

cnx1 = cnxpool.get_connection()
cnx2 = cnxpool.get_connection()
kilokahn
  • 1,136
  • 2
  • 18
  • 38
2

Making your own connection pool is a BAD idea if your app ever decides to start using multi-threading. Making a connection pool for a multi-threaded application is much more complicated than one for a single-threaded application. You can use something like PySQLPool in that case.

It's also a BAD idea to use an ORM if you're looking for performance.

If you'll be dealing with huge/heavy databases that have to handle lots of selects, inserts, updates and deletes at the same time, then you're going to need performance, which means you'll need custom SQL written to optimize lookups and lock times. With an ORM you don't usually have that flexibility.

So basically, yeah, you can make your own connection pool and use ORMs but only if you're sure you won't need anything of what I just described.

flexo
  • 141
  • 1
  • 1
1

Use DBUtils, simple and reliable.

pip install DBUtils
adiga
  • 34,372
  • 9
  • 61
  • 83
ospider
  • 9,334
  • 3
  • 46
  • 46
0

i did it for opensearch so you can refer it.

    from opensearchpy import OpenSearch

    def get_connection():
                    connection = None
                    try:
                        connection = OpenSearch(
                            hosts=[{'host': settings.OPEN_SEARCH_HOST, 'port': settings.OPEN_SEARCH_PORT}],
                            http_compress=True,
                            http_auth=(settings.OPEN_SEARCH_USER, settings.OPEN_SEARCH_PASSWORD),
                            use_ssl=True,
                            verify_certs=True,
                            ssl_assert_hostname=False,
                            ssl_show_warn=False,
                        )
                    except Exception as error:
                        print("Error: Connection not established {}".format(error))
                    else:
                        print("Connection established")
                    return connection

    class OpenSearchClient(object):
        connection_pool = []
        connection_in_use = []

        def __init__(self):
            if OpenSearchClient.connection_pool:
                pass
            else:

                OpenSearchClient.connection_pool = [get_connection() for i in range(0, settings.CONNECTION_POOL_SIZE)]

        def search_data(self, query="", index_name=settings.OPEN_SEARCH_INDEX):
            available_cursor = OpenSearchClient.connection_pool.pop(0)
            OpenSearchClient.connection_in_use.append(available_cursor)
            response = available_cursor.search(body=query, index=index_name)
            available_cursor.close()
            OpenSearchClient.connection_pool.append(available_cursor)
            OpenSearchClient.connection_in_use.pop(-1)
            return response
  • Your answer could be improved with additional supporting information. Please [edit] to add further details, such as citations or documentation, so that others can confirm that your answer is correct. You can find more information on how to write good answers [in the help center](/help/how-to-answer). – dsillman2000 Jun 21 '22 at 15:29