urllib2.open error in python

Question

I can't get URL

base_url = "http://status.aws.amazon.com/"
    socket.setdefaulttimeout(30) 
htmldata = urllib2.urlopen(base_url)
for url in parser.url_list:
        get_rss_th = threading.Thread(target=parser.get_rss,name="get_rss_th", args=(url,))
get_rss_th.start()

    print htmldata

<addinfourl at 140176301032584 whose fp = <socket._fileobject object at 0x7f7d56a09750>>

when specifying htmldata.read() (Python error when using urllib.open)

then getting blank screen

python 2.7

whole code:https://github.com/tech-sketch/zabbix_aws_template/blob/master/scripts/AWS_Service_Health_Dashboard.py

The problem is, that from URL link (RSS feed), i can't get output (data) variable data = zbx_client.recv(4096) is empty- no status

Other than the indentation error, your code should work. It works for me locally, and [on repl.it](https://repl.it/repls/SeashellForkedCodewarrior). — abarnert, Aug 24 '18 at 21:31
Meanwhile, is there a reason you're using `urllib2` instead of installing `requests`, the way [`urllib2`'s own docs](https://docs.python.org/2/library/urllib2.html) suggest? It shouldn't make any difference here, but then your code should be working, so… — abarnert, Aug 24 '18 at 21:31
First, please [edit] the relevant code—as a runnable [mcve]—into your question, don't just give us an external link. — abarnert, Aug 24 '18 at 23:00
Second, there is no attempt to `print htmldata.read()` in your linked code, so I still have no idea where your problem is. Is it just that you added a `print htmldata.read()` for debugging purposes after the `parser.feed(htmldata.read())`? If so, that's your problem: when you `read()` a file-like object, it reads the whole thing. After that, you're at the end of the file, so if you try to `read()` the whole file from there, you get nothing. If so, all you have to do to fix it is only read once, like `contents = htmldata.read()`, then you can `parser.feed(contents)` and `print contents`. — abarnert, Aug 24 '18 at 23:02
@abarnert, yes, i added print htmldata for debugging, your solution helped — , Aug 25 '18 at 00:08

score 0 · Accepted Answer · answered Aug 25 '18 at 00:47

There's no real problem with your code (except for a bunch of indentation errors and syntax errors that apparently aren't in your real code), only with your attempts to debug it.

First, you did this:

print htmldata

That's perfectly fine, but since htmldata is a urllib2 response object, printing it just prints that response object. Which apparently looks like this:

<addinfourl at 140176301032584 whose fp = <socket._fileobject object at 0x7f7d56a09750>>

That doesn't look like particularly useful information, but that's the kind of output you get when you print something that's only really useful for debugging purposes. It tells you what type of object it is, some unique identifier for it, and the key members (in this case, the socket fileobject wrapped up by the response).

Then you apparently tried this:

print htmldata.read()

But already called read on the same object earlier:

parser.feed(htmldata.read())

When you read() the same file-like object twice, the first time gets everything in the file, and the second time gets everything after everything in the file—that is, nothing.

What you want to do is read() the contents once, into a string, and then you can reuse that string as many times as you want:

contents = htmldata.read()

parser.feed(contents)

print contents

It's also worth noting that, as the urllib2 documentation said right at the top:

See also The Requests package is recommended for a higher-level HTTP client interface.

Using urllib2 can be a big pain, in a lot of ways, and this is just one of the more minor ones. Occasionally you can't use requests because you have to dig deep under the covers of HTTP, or handle some protocol it doesn't understand, or you just can't install third-party libraries, so urllib2 (or urllib.request, as it's renamed in Python 3.x) is still there. But when you don't have to use it, it's better not to. Even Python itself, in the ensurepip bootstrapper, uses requests instead of urllib2.

With requests, the normal way to access the contents of a response is with the content (for binary) or text (for Unicode text) properties. You don't have to worry about when to read(); it does it automatically for you, and lets you access the text over and over. So, you can just do this:

import requests
base_url = "http://status.aws.amazon.com/"
response = requests.get(base_url, timeout=30)
parser.feed(response.content) # assuming it wants bytes, not unicode
print response.text

score -2 · Answer 2 · answered Aug 24 '18 at 21:24

-2

If I use this code:

import urllib2
import socket
base_url = "http://status.aws.amazon.com/"
socket.setdefaulttimeout(30)
htmldata = urllib2.urlopen(base_url)
print(htmldata.read())

I get the page's HTML code.

answered Aug 24 '18 at 21:24

Pablo Santa Cruz

176,835
32
241
292

urllib2.open error in python

2 Answers2