3

I am trying to fetch a page and urlopen hangs and never returns anything, although the web page is very light and can be opened with any browser without any problems

import urllib.request
with urllib.request.urlopen("http://www.planalto.gov.br/ccivil_03/_Ato2007-2010/2008/Lei/L11882.htm") as response:
    print(response.read())

This simple code just freezes while retrieving the response, but if you try to open http://www.planalto.gov.br/ccivil_03/_Ato2007-2010/2008/Lei/L11882.htm it opens without any problem

wpercy
  • 9,636
  • 4
  • 33
  • 45
Kabal
  • 53
  • 6
  • One option would be to use wireshark to see the difference between your python request and your browser's request. – Gillespie May 15 '17 at 19:34
  • A second option is to try a different commandline tool. Are you able to fetch with `curl` or `wget`? If not, that indicates an environment problem such as proxy settings – Gillespie May 15 '17 at 19:35
  • A third option is to try an alternative to `urllib` such as [requests](http://docs.python-requests.org/en/master/) (which I highly recommend as a longtime python user, btw) – Gillespie May 15 '17 at 19:36
  • Thanks for your comments, I am able to do both `wget` and `curl`. I tried using requests and it just freezes – Kabal May 15 '17 at 19:48
  • Found this question while trying to troublesheet a similar issue. cURL or browser could open fine, but urlopen wouldn't. Then I found it only happened on wi-fi, but worked fine when using wired ethernet jack. Ended up (I think) the address resolution part of the connection was trying to resolve using IPv6 when using wi-fi. Disabled IPv6 for the wi-fi connection (on Ubuntu) and the urlopen worked fine after that. – user479 Nov 24 '18 at 22:17

1 Answers1

1

www.planalto.gov.br is using user-agent detection. If you specify a valid user-agent, the request fulfills correctly. The urllib library didn't crash, it's just waiting.

curl -H "User-Agent:Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/57.0.2987.133 Safari/537.36" http://www.planalto.gov.br/ccivil_03/_Ato2007-2010/2008/Lei/L11882.htm

worked just fine for me but

curl http://www.planalto.gov.br/ccivil_03/_Ato2007-2010/2008/Lei/L11882.htm

did not.

Like RPGillespie said above, use urllib2 or requests to add the user-agent header (see How do I set headers using python's urllib? for more information about that).

Community
  • 1
  • 1
olamork
  • 179
  • 4