I'm using Python http.client.HTTPResponse.read()
to read data from a stream. That is, the server keeps the connection open forever and sends data periodically as it becomes available. There is no expected length of response. In particular, I'm getting Tweets through the Twitter Streaming API.
To accomplish this, I repeatedly call http.client.HTTPResponse.read(1)
to get the response, one byte at a time. The problem is that the program will hang on that line if there is no data to read, which there isn't for large periods of time (when no Tweets are coming in).
I'm looking for a method that will get a single byte of the HTTP response, if available, but that will fail instantly if there is no data to read.
I've read that you can set a timeout when the connection is created, but setting a timeout on the connection defeats the whole purpose of leaving it open for a long time waiting for data to come in. I don't want to set a timeout, I want to read data if there is data to be read, or fail if there is not, without waiting at all.
I'd like to do this with what I have now (using http.client
), but if it's absolutely necessary that I use a different library to do this, then so be it. I'm trying to write this entirely myself, so suggesting that I use someone else's already-written Twitter API for Python is not what I'm looking for.
This code gets the response, it runs in a separate thread from the main one:
while True:
try:
readByte = dc.request.read(1)
except:
readByte = []
if len(byte) != 0:
dc.responseLock.acquire()
dc.response = dc.response + chr(byte[0])
dc.responseLock.release()
Note that the request is stored in dc.request
and the response in dc.response
, these are created elsewhere. dc.responseLock
is a Lock
that prevents dc.response
from being accessed by multiple threads at once.
With this running on a separate thread, the main thread can then get dc.response
, which contains the entire response received so far. New data is added to dc.response
as it comes in without blocking the main thread.
This works perfectly when it's running, but I run into a problem when I want it to stop. I changed my while statement to while not dc.twitterAbort
, so that when I want to abort this thread I just set dc.twitterAbort
to True
, and the thread will stop.
But it doesn't. This thread remains for a very long time afterward, stuck on the dc.request.read(1)
part. There must be some sort of timeout, because it does eventually get back to the while
statement and stop the thread, but it takes around 10 seconds for that to happen.
How can I get my thread to stop immediately when I want it to, if it's stuck on the call to read()
?
Again, this method is working to get Tweets, the problem is only in getting it to stop. If I'm going about this entirely the wrong way, feel free to point me in the right direction. I'm new to Python, so I may be overlooking some easier way of going about this.