3

I have the following scenario:

I have a web service that upon a single user request aggregates data from some third party servers. The requests to the third parties can be SOAP or plain urllib2 requests with XML data, and each one is done in a separate thread.

Here is a overall picture of what I'm doing:

ThirdParty1(Thread):
     def run(self):
         try:
             result = SOAPProxy('http://thirdparty.com', timeout=2).method(params)
             dostuff_and_save(result)  # save results on database
         except Exception:
             log.warn('Ooops')

ThirdParty2(Thread): ...

def myview(params):
    thread = [ThirdParty1(), ThirdParty2()]
    for t in thread: t.start()
    for t in thread: t.join(timeout=2)
    return result  # this is actually just a token, that I use to retrieve the data saved by the threads

My current problem is to reliably return a response to my user's request when any of the third party servers hang on their side. I've tried to set timeout on the thread join, on the SOAPProxy object, and to do a socket.setdefaulttimeout. None of the timeouts are respected.

I managed to dig down the SOAPProxy problem and found out that it uses httplib, and httplib deep down uses socket.makefile(), the docs says:

socket.makefile([mode[, bufsize]])

Return a file object associated with the socket. (File objects are described in File > Objects.) The file object references a dup()ped version of the socket file descriptor, so > the file object and socket object may be closed or garbage-collected independently. The socket must be in blocking mode (it can not have a timeout). The optional mode and bufsize arguments are interpreted the same way as by the built-in file() function.

Every other SOAP library that I found, one way or the other, uses the httplib too. To complicate the matters, I might need to access the database from the requesting thread and I do not fully understand what are the consequences of killing the thread with this sort of strategy, I'm considering do database stuff from outside the thread, when that is possible.

Then, my questions is:

How can my web service respond to the user in due time and gracefully handle the badly behaving third party servers when the timeout is not respected?


The fact that HTTPResponse uses makefile might not be as bad as I thought, turns out that makefile is really non buffering by default, and it can raise timeout exceptions, here is what I tried:

On one console I opened netcat -l -p 8181 '0.0.0.0' in another one I open python2.7 and runned:

>>> import socket
>>> af, socktype, proto, canoname, sa = socket.getaddrinfo('0.0.0.0', 8181, 0, socket.SOCK_STREAM)[0]
>>> s=socket.socket(af, socktype, proto)
>>> s.settimeout(.5)
>>> s.connect(sa)
>>> f=s.makefile('rb', 0)
>>> f.readline()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib/python2.7/socket.py", line 430, in readline
    data = recv(1)
socket.timeout: timed out

But my problem of how to do reliable third party request persists.

Community
  • 1
  • 1
Augusto Hack
  • 2,032
  • 18
  • 35
  • it seems that [python is really bad at threading](http://blip.tv/carlfk/mindblowing-python-gil-2243379) so I'm considering `multiprocessing` instead, if I need to keep threads I will reduce the amount work just the external request ... – Augusto Hack Mar 16 '13 at 14:14
  • The link above is broken, [here is the talk on youtube](http://www.youtube.com/watch?v=Obt-vMVdM8s), also I should note that python is not bad at threading, python's threading is for IO not processing. – Augusto Hack Jan 01 '14 at 00:17

1 Answers1

0

I think I managed to build a realiable solution.

The first thing that I do is to launch the Threads that will request any thid party server that is needed. This works great because the GIL is not hold while the thread is doing a blocking operation (socket.recv() for that matter), this allows my server to do its own thing while the request is being processed.

I removed all side effects from the Threads, no more talking to the database, if a request takes more than expected to respond I don't need to kill it, just leave it be and ignore it.

A timer is started when the first thread is launched, after my server do it's thing and it absolutely needs the third parties reults it checks everythread to see if they are done, when they are all done or a timeout is hit it gets the result for each done thread, it looks like this:

start, data = time(), []
threads = launch_threads()
# ... do my thing
for t in threads:  # wait up to TIMEOUT
  timeout = TIMEOUT - (time() - start)
  t.join(t)
for t in threads:
  if not t.isAlive():  # should not have a race
    data.append(t.getdata())
Augusto Hack
  • 2,032
  • 18
  • 35