How can I find out what is happening when this urllib2 script hangs?

Question

I use this script to download a json about once a minute and save to a unique file name. Sometimes it just hangs. It successfully saves the file that the printed line indicates, but then just waits for hours until I notice.

My question is 1) is there maybe something obvious I haven't though of to add (some kind of time-out?) and 2) is there something I can do to find out where it is stuck, when it is stuck? (other than putting a print line in between every other line)

If the internet connection is unresponsive, I see the "has FAILED" line about once a minute as expected, until the internet connection is working again, so that doesn't seem to be the problem.

Note: I save and load the n this way to be a little robust against random crashes, restarts, etc.

import json
import urllib2
import numpy as np

n = np.load("n.npy")
print "loaded n: ", n

n += 10 # leave a gap

np.save("n", n) 
print "saved: n: ", n

url = "http:// etc..."

for i in range(10000):

    n = np.load("n.npy")
    n += 1

    try:
        req        = urllib2.Request(url, headers={"Connection":"keep-alive",
                                                   "User-Agent":"Mozilla/5.0"})
        response   = urllib2.urlopen(req)

        dictionary = json.loads(response.read())

        filename   = "info_" + str(100000+n)[1:]
        with open(filename, 'w') as outfile:
            json.dump(dictionary, outfile)

        np.save("n", n)
        print "n, i = ", n, i, filename, "len = ", len(dictionary)
    except:
        print "n, i = ", n, i, " has FAILED, now continuing..."
    time.sleep(50)

if this happens often (every 2nd or 3rd time) then have you tried debugging??? You can use https://docs.python.org/2/library/pdb.html — Muhammad Tahir, Oct 18 '15 at 11:38
Only about once a day. I haven't tried a debugger - since I need to record data as continuously as possible right now, can I run from the debugger and read a "debug" report off line while I restart data taking? (I'll start reading your link now) — uhoh, Oct 18 '15 at 11:41
No I don't think so that will be possible (reading debug log offline later). Debugging means you will be running your code manually one line at a time so you will be able to find which line of code is causing this. — Muhammad Tahir, Oct 18 '15 at 11:44
`urllib2.urlopen(req)` seems to be the only point which can cause this. You can log time before and after calling `urlopen` this way you will be able to see if `urlopen` is the reason. — Muhammad Tahir, Oct 18 '15 at 11:47
Or you you can add timeout which you mentioned in your question. Here is the link to help you with timeout http://stackoverflow.com/questions/16646322/setting-the-timeout-on-a-urllib2-request-call — Muhammad Tahir, Oct 18 '15 at 11:49
I see! Great! Thanks @MuhammadTahirButt. So when I look at this documentaiton docs.python.org/2/library/urllib2.html I am not sure what will happen if there is a timeout. Does it raise an exception - and I should use try? OH - it seems this [answer](http://stackoverflow.com/a/2712686/3904031) may apply. — uhoh, Oct 18 '15 at 12:04
Sorry I don't know what will happen in that case, you will have to test it with urllib2. But I would suggest using requests http://docs.python-requests.org/en/latest/ library for this. It has timeout and raise Timeout exception in case of timeout. — Muhammad Tahir, Oct 18 '15 at 12:09
It's been working flawlessly now for a few days, just by adding a generous `timeout=20` as you recommended. It seems that the `try` / `except` is handling it. I can disconnect from internet, close my laptop, anything and it just recovers when everything is OK. Problem solved! Thanks! — uhoh, Oct 20 '15 at 00:58

How can I find out what is happening when this urllib2 script hangs?

0 Answers0