3

Is this a good multithreading pattern? It works but it's so simple that I have a suspicion that there must be some hidden pitfalls. I would like to use it inside WSGI applications for asynchronous URL fetches.

I was inspired by GAE Asynchronous Requests.

import datetime
import time
import threading

start_time = datetime.datetime.now()

def func(value):
    print 'START: {} {}'.format(value, datetime.datetime.now())
    time.sleep(5)
    print 'END: {} {}'.format(value, datetime.datetime.now())
    return str(value) * 10


class MyThread(threading.Thread):
    def __init__(self, func, args=(), kwargs={}):
        super(MyThread, self).__init__()
        self.func = func
        self.args = args
        self.kwargs = kwargs
        self.result = None

    def run(self):
        self.result = self.func(*self.args, **self.kwargs)

    def get_result(self):
        self.join()
        return self.result

def run_async(*args, **kwargs):
    t = MyThread(*args, **kwargs)
    t.start()
    return t

# This will be called inside WSGI request handler only:

t1 = run_async(func=func, args=(1,))
t2 = run_async(func=func, args=(2,))
t3 = run_async(func=func, args=(3,))

print '\n'
print 'Do other stuff...'
print '\n'

print t1.get_result()
print t2.get_result()
print t3.get_result()

print '=' * 70
print 'Duration: {}'.format(datetime.datetime.now() - start_time)

The output is:

START: 1 2013-02-21 16:15:51.918112
START: 2 2013-02-21 16:15:51.918642
START: 3 2013-02-21 16:15:51.919138


Do other stuff...


END: 1 2013-02-21 16:15:56.918900
1111111111
END: 2 2013-02-21 16:15:56.924068
2222222222
END: 3 2013-02-21 16:15:56.924465
3333333333
Peter Hudec
  • 2,462
  • 3
  • 22
  • 29
  • 1
    It looks like you're reinventing [`concurrent.futures`](http://docs.python.org/dev/library/concurrent.futures.html) (There are probably potential pitfalls but the reason why threading is "hard" is that you can't say what they are outside the context of the whole program and how this code is used. In this case it would heavily depend on what shared resources `MyThread.func()` accesses.) – millimoose Feb 21 '13 at 15:35
  • Say, I limit the `MyThread.func()` to `httplib.HTTPConnection.request()` call. Are there known scenarios where things can get wrong? – Peter Hudec Feb 21 '13 at 15:43
  • @millimoose You forgot to add that concurrent.futures is Python3.2. – freakish Feb 21 '13 at 15:56
  • @PeterHudec Your code is fine. Python is meant to be quick and simple. – freakish Feb 21 '13 at 15:56
  • Maybe. Given that it's an object-oriented API, I wouldn't expect `HTTPConnection` instances to share data though. – millimoose Feb 21 '13 at 15:56
  • It should only be called inside a WSGI request handler to speed things up. I noticed that it's Python 3.2. My target is 2.7. – Peter Hudec Feb 21 '13 at 16:04
  • @freakish The documentation states that pretty clearly for me. Also, there's nothing about Python the language that will protect you from threading bugs. You still have to read the docs and pore over your code to identify how resources are shared between threads. – millimoose Feb 21 '13 at 16:08
  • @millimoose Of course, but that is not related to OP's question. He asked about multithreading pattern, not error handling. That's why I'm saying that the code is fine. – freakish Feb 21 '13 at 16:14
  • @freakish I'm not talking about error handling. The essence of writing thread-safe code is accessing shared resources concurrently in a way that will produce the correct result. This includes, but is not restricted to not blowing up with an exception. This is also why it makes no sense to ask "is this code thread-safe" if it doesn't contain all such accesses to shared resources, nor does it make sense to say "Python will take care of this for you" because it can't. – millimoose Feb 21 '13 at 16:16
  • @millimoose But OP's not asking these questions. So all of this has nothing to do with the question. And by the way: incorrect result *is* an error (i.e. error handling does include making a code thread-safe). At least for me. – freakish Feb 21 '13 at 16:24
  • Correct me if I'm wrong (I'm fairly new to Python) but seems like there are no shared resources involved in fetching URLs? The error handling seems to be a good point. – Peter Hudec Feb 21 '13 at 16:28
  • @freakish The OP is asking "are there any problems my code might cause". My answer is "in the face of multithreading it's impossible to say" which I maintain is true, instead of trying to instill a sense of false security by only talking about the code I can see. Also that's a terrible definition of "error handling": I understand error handling as "how you recover from an error occurring" (how you *handle* an error that's happened), not "preventing errors" (that haven't happened yet and thus don't require being handled). – millimoose Feb 21 '13 at 16:39
  • @PeterHudec There's nothing preventing `httplib` from sharing some global / module-level state under the hood. It seems extremely unlikely, and it could arguably be considered a bug in the stdlib, but it could happen. Essentially I'm not saying "don't do this", I'm just trying to convey the mindset needed to write threadsafe code. – millimoose Feb 21 '13 at 16:40
  • @millimoose We can only analyze what we see. I don't instill anything. The code OP has shown us is fine - that's the objective truth. The pattern works. Now if he wants to ask about thread-safety of httplib, then there's another question about that: http://stackoverflow.com/questions/5825151/are-urllib2-and-httplib-thread-safe But why do you want me to answer a question which has not been asked? – freakish Feb 21 '13 at 17:19
  • @freakish Because it's a question that *should* be asked; i.e. the fact that composing "correct" pieces of multithreaded code might lead to incorrect code is **the** fundamental problem in concurrency, and the OP should be aware of this. In light of that fact only analyzing the code you can see when you know it's multithreaded makes no sense. Saying it's fine is misleading, because it's just impossible to tell. Futures are a useful concept and they make it very easy to correctly express a dependency between asynchronous tasks, but they do nothing for resource-sharing which must be considered. – millimoose Feb 21 '13 at 17:32
  • I asked the question because every example i found involvedd a queue and I was just wondering whether I can do it this way without a queue. – Peter Hudec Feb 21 '13 at 17:34
  • @PeterHudec Yes, futures are a valid way to replace using queues / shared data and semaphores to communicate between threads. That said yours might not be a correct implementation of futures, which is why I suggested just using the one in the stdlib. (Or at least looking how that's implemented to model your solution after.) – millimoose Feb 21 '13 at 17:36
  • At least now I know **futures** is the pattern. – Peter Hudec Feb 21 '13 at 17:53

0 Answers0