I have the following scenario:
I have a web service that upon a single user request aggregates data from some third party servers. The requests to the third parties can be SOAP or plain urllib2 requests with XML data, and each one is done in a separate thread.
Here is a overall picture of what I'm doing:
ThirdParty1(Thread):
def run(self):
try:
result = SOAPProxy('http://thirdparty.com', timeout=2).method(params)
dostuff_and_save(result) # save results on database
except Exception:
log.warn('Ooops')
ThirdParty2(Thread): ...
def myview(params):
thread = [ThirdParty1(), ThirdParty2()]
for t in thread: t.start()
for t in thread: t.join(timeout=2)
return result # this is actually just a token, that I use to retrieve the data saved by the threads
My current problem is to reliably return a response to my user's request when any of the third party servers hang on their side. I've tried to set timeout on the thread join, on the SOAPProxy object, and to do a socket.setdefaulttimeout
. None of the timeouts are respected.
I managed to dig down the SOAPProxy problem and found out that it uses httplib, and httplib deep down uses socket.makefile(), the docs says:
socket.makefile([mode[, bufsize]])
Return a file object associated with the socket. (File objects are described in File > Objects.) The file object references a dup()ped version of the socket file descriptor, so > the file object and socket object may be closed or garbage-collected independently. The socket must be in blocking mode (it can not have a timeout). The optional mode and bufsize arguments are interpreted the same way as by the built-in file() function.
Every other SOAP library that I found, one way or the other, uses the httplib too. To complicate the matters, I might need to access the database from the requesting thread and I do not fully understand what are the consequences of killing the thread with this sort of strategy, I'm considering do database stuff from outside the thread, when that is possible.
Then, my questions is:
How can my web service respond to the user in due time and gracefully handle the badly behaving third party servers when the timeout is not respected?
The fact that HTTPResponse uses makefile might not be as bad as I thought, turns out that makefile
is really non buffering by default, and it can raise timeout exceptions, here is what I tried:
On one console I opened netcat -l -p 8181 '0.0.0.0'
in another one I open python2.7
and runned:
>>> import socket
>>> af, socktype, proto, canoname, sa = socket.getaddrinfo('0.0.0.0', 8181, 0, socket.SOCK_STREAM)[0]
>>> s=socket.socket(af, socktype, proto)
>>> s.settimeout(.5)
>>> s.connect(sa)
>>> f=s.makefile('rb', 0)
>>> f.readline()
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/lib/python2.7/socket.py", line 430, in readline
data = recv(1)
socket.timeout: timed out
But my problem of how to do reliable third party request persists.