4

Hi Stackoverflow community,

I'm trying to get familiar with the urllib.request standard library and use it in my scripts at work instead of wget. I'm however unable to get the detailed HTTP messages displayed neither in IDLE nor using script file or manually typing the commandy into cmd (py).

I'm using Python on Windows 7 x64, and tried 3.5 and 3.6 including 3.6.1rc1 without success.

The messages are supposedly turned on using this command:

http.client.HTTPConnection.debuglevel = 1

so here is my sample code. It works but no details are displayed:

import http.client
import urllib.request
http.client.HTTPConnection.debuglevel = 1
response = urllib.request.urlopen('http://stackoverflow.com')
content = response.read()
with open("stack.html", "wb") as file:
    file.write(content)

I have tried using .set_debuglevel(1) without success. There seem to be years old questions here Turning on debug output for python 3 urllib However this is the same as I have and it's not working. Also in this question's comment user Yen Chi Hsuan says it's a bug and reported it here https://bugs.python.org/issue26892

The bug was closed in June 2016 so I would expect this is corrected in recent Python versions.

Maybe I'm missing something (e.g. something else needs to be enabled / installed etc..) but I spent some time on this and reached a dead end.

Is there a working way to have the http detailed messages displayed with urllib on Python 3 on Windows?

Thank you

EDIT: the response suggested by pvg works on the simple example but I cannot make it to work in a case where login needed. The HTTPBasicAuthHandler does not have this debuglevel attribute. And when I try combining multiple handlers into the opener it does not work either.

userName = 'mylogin'
passWord  = 'mypassword'
top_level_url = 'http://page-to-login.com'

# create an authorization handler
passman = urllib.request.HTTPPasswordMgrWithDefaultRealm()
passman.add_password(None, top_level_url, userName, passWord);

auth_handler = urllib.request.HTTPBasicAuthHandler(passman)
opener = urllib.request.build_opener(auth_handler)
urllib.request.install_opener(opener)

result = opener.open(top_level_url)
content = result.read()
alleby
  • 61
  • 1
  • 6
  • You can try copying exactly what's in the test case for the bug or just switching to urllib3 and using `urllib3.add_stderr_logger()` – pvg Mar 18 '17 at 16:12

2 Answers2

1

The example in the issue you linked shows the working code, a version reproduced below:

import urllib.request

handler = urllib.request.HTTPHandler(debuglevel=10)
opener = urllib.request.build_opener(handler)
content = opener.open('http://stackoverflow.com').read()

print(content[0:120])

This is pretty clunky, another option is to use a friendlier library like urllib3 (http://urllib3.readthedocs.io/en/latest/).

import urllib3

urllib3.add_stderr_logger()
http = urllib3.PoolManager()
r = http.request('GET', 'http://stackoverflow.com')
print(r.status)

If you decide to use the requests library instead, the following answer describes how to set up logging:

How can I see the entire HTTP request that's being sent by my Python application?

Community
  • 1
  • 1
pvg
  • 2,673
  • 4
  • 17
  • 31
  • Thank you! The basic example I provided works fine, but when I want to combine it with HTTPBasicAuthHandler to access page where login is needed, I'm not able to set the debuglevel as it has no such attribute. So I was hoping for a way to turn the debuglevel "globally" for http requests as the original example is showing :/ – alleby Mar 19 '17 at 11:19
  • I'm not sure if this comment means this is an acceptable answer or if your question has other requirements. If it's the latter, update your question to explain. – pvg Mar 19 '17 at 17:28
  • Hi, Yes you answered my question but the method does not work always. I edited my question as suggested. If needed I can open a new question and close this one. Thx – alleby Mar 20 '17 at 08:51
  • Well, that's a pretty different question, you really should be asking about the specific thing you're trying to accomplish. Additionally, providing some context as to your motivation is helpful, clearly you're not trying to debug urllib itself so why are you so keen on getting its internal debug messages? There are more general ways to trace an http flow if that's what you're after. And again, a higher-level library like requests might be appropriate. So perhaps you should indeed write a new question that actually describes what you're after. – pvg Mar 20 '17 at 09:01
0

Ever since Python version 3.5.2 (release ~June 2016) the http.client.HTTPConnection.debuglevel is entirely ignored in favor of the debuglevel constructor argument for urllib.request.HTTPHandler.

This is due to this change that sets the value of http.client.HTTPConnection.debuglevel to whatever is set in urllib.request.HTTPHandler's constructor argument debuglevel, on this line.

A PR has been opened to fix this, but in the mean time you can either use the constructor argument for HTTPHandler and HTTPSHandler (as pvg's answer points out), or you can monkey patch the __init__ methods of HTTPHandler and HTTPSHandler to respect the global values like so:

https_old_init = urllib.request.HTTPSHandler.__init__

def https_new_init(self, debuglevel=None, context=None, check_hostname=None):
    debuglevel = debuglevel if debuglevel is not None else http.client.HTTPSConnection.debuglevel
    https_old_init(self, debuglevel, context, check_hostname)

urllib.request.HTTPSHandler.__init__ = https_new_init

http_old_init = urllib.request.HTTPHandler.__init__

def http_new_init(self, debuglevel=None):
    debuglevel = debuglevel if debuglevel is not None else http.client.HTTPSConnection.debuglevel
    http_old_init(self, debuglevel)

urllib.request.HTTPHandler.__init__ = http_new_init

Note: I don't recommend setting the debuglevel in HTTPHandler's as a method argument default value because the default values for method arguments get evaluated at function definition evaluation time, which, for HTTPHandler's constructor, is when the module urllib.request is imported.

wheeler
  • 2,823
  • 3
  • 27
  • 43