1

I have written some code for testing the performance of a database when users are simultaneously running queries against it. The objective is to understand how the elapsed time increases with number of users. The code contains a class User (shown below) whose objects are created by parsing XML files.

class User(object):

    def __init__(self, id, constr):
        self.id = id
        self.constr = constr
        self.queryid = list()
        self.queries = list()

    def openConn(self):
        self.cnxn = pyodbc.connect(self.constr)
        logDet.info("%s %s"%(self.id,"Open connection."))

    def closeConn(self):
        self.cnxn.close()
        logDet.info("%s %s"%(self.id,"Close connection."))

    def executeAll(self):
        self.openConn()

        for n,qry in enumerate(self.queries):
            try:
                cursor = self.cnxn.cursor()
                logTim.info("%s|%s|beg"%(self.id, self.queryid[n]))
                cursor.execute(qry)
                logTim.info("%s|%s|end"%(self.id, self.queryid[n]))

            except Exception:
                cursor.rollback()
                logDet.exception("Error while running query.")

        self.closeConn()

pyODBC is used for the connection to the database. Two logs are created -- one detailed (logDet) and one which has only the timings (logTim). The User objects are stored in a list. The queries for each user are also in a list (not in a thread-safe Queue).

To simulate parallel users, I have tried a couple of different approaches:

def worker(usr):
    usr.executeAll()

Option 1: multiprocessing.Pool

pool = Pool(processes=len(users))
pool.map(worker, users)

Option 2: threading.Thread

for usr in users:
    t = Thread(target=worker, args=(usr,))
    t.start()

Both approaches work. In my test, I have tried for #users = 2,6,..,60, and each user has 4 queries. Given how the query times are captured, there should be less than a second of delay between the end of a query and beginning of next query i.e. queries should be fired one after the other. That's exactly what happens with multiprocessing but with threading, a random delay is introduced before the next query. The delay can be over a minute (see below). enter image description here

Using: python3.4.1, pyodbc3.0.7; clients running code Windows 7/RHEL 6.5

I would really prefer to get this to work with threading. Is this expected in the threading approach or is there a command that I am missing? Or how can that be re-written? Thx.

ironv
  • 978
  • 10
  • 25

1 Answers1

1

When you use the threading-based approach, you're starting one thread per user, all the way up to 60 threads. All of those threads must fight for access to the GIL in between their I/O operations. This introduces a ton of overhead. You would probably see better results if you used a ThreadPool limited to a smaller number of threads (maybe 2 * multiprocessing.cpu_count()?), even with a higher number of users:

from multiprocessing.pool import ThreadPool
from multiprocessing import cpu_count

pool = ThreadPool(processes=cpu_count()*2)
pool.map(worker, users)

You may want to limit the number of concurrent processes you run, too, for memory-usage reasons. Starting up sixty concurrent Python processes is pretty expensive.

dano
  • 91,354
  • 19
  • 222
  • 219
  • Thanks @dano. I am a little confused with your comment "you may want to limit the number of concurrent processes you run too". I do not need to do that if I use your ThreadPool idea, do I? – ironv Aug 02 '14 at 19:41
  • 1
    @ironv If you use a `ThreadPool`, no. But if you choose to use a regular `multiprocessing.Pool`, you may want to use less than `len(users)` processes to reduce the memory overhead. You may be taxing the CPU more than you really want to with that many processes, too. – dano Aug 02 '14 at 19:44
  • Something else that I observed is even in its original form (from my OP), the thread solution was working well on Windows while the multiprocessing.Pool solution was working well on RHEL. Before you posted, I was going with a os.name check for "nt" or "posix" and was according invoking one or the other. I am testing the ThreadPool on Windows now and will then try it on Linux to see if there are delays. The other issue is that my RHEL box has 96 cores. Should I still go with cpu_count()*2? Or do a min(SOMELIMIT,cpu_count()*2)? – ironv Aug 02 '14 at 19:53
  • 96 cores?! I'm jealous. With that many cores, in a process pool-based approach I'd be worried about taxing memory too much with `2* cpu_count()`. You're maxing out at 60 users, so you'll never use more than 60 processes, anyway. You could probably do just fine with even less than that. It's really hard to say for sure what number will be the sweet spot for you, it's the kind of thing you can only verify by testing with different values. – dano Aug 02 '14 at 20:00
  • having read the Beazley presentation you referred earlier, I have a question. In my script, the only work done by a thread is (i) grab and fire a query using pyODBC and then wait till it is done (ii) write end time to log file. Repeat. Question: is the GIL released when the thread is waiting for the query to finish or is it only released after a query has been completed? – ironv Aug 12 '14 at 19:49
  • @ironv The GIL is released while I/O is running - meaning while the query is actually sending data to the DB, and waiting for a reply. Once it gets the reply, the GIL must be acquired again. It will then be released again when you write the end time to the log file. – dano Aug 12 '14 at 20:04
  • Thanks @dano. Your code with ThreadPool had a very interesting result. When I did NOT use ThreadPool and I had 20 concurrent users on the database, the total elapsed time was 65mins. When I set ThreadPool(processes=4) meaning the database only had 4 concurrent users at a time, the total elapsed time for the test was 30mins (yes, less than half the time)!!! My test consists of 4 queries for each user. The code as written requires a user to fire all four queries sequentially before another user is started. – ironv Aug 12 '14 at 21:08
  • @ironv Right, 20 concurrent threads fighting for CPU-time and the GIL introduces too much overhead, even with an I/O-bound operation. Very interesting to see how big the difference is! You could try adjusting the number of threads in the pool upwards to see if performance will continue to improve, until a certain point is reached where begins to degrade. – dano Aug 12 '14 at 23:21