I'm using a Python spider to crawl the internet using a urllib2 OpenerDirector. The problem is that a connection will inevitably hang on an https address, apparently ignoring the timeout value.
One solution would be to run it in a thread and then kill and restart the thread if it hangs. Apparently Python doesn't support killing threads and it's considered a Bad Idea because of garbage collection and other issues. This solution would be preferable to me however, because of the simplicity.
Another idea would be to use an asynchronous library like Twisted but that doesn't solve the problem.
I either need a way to force interrupt the call or fix the way the urllib2 OpenerDirector handles timeouts. Thanks.