urllib error: cannot fetch html data. Python, Beagle Bone Black

Question

I was making my project on mac and I tried to do the same things by Beagle Bone Black(BBB). However, I couldn't use urllib in BBB so I am stuck: I cannot go forward.(it is working well in my mac)

I tried this simple code as an example:

import urllib
conn = urllib.urlopen('http://stackoverflow.com/questions/8479736/using-python-urllib-how-to-avoid-non-html-content')

then this Error occurred:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib/python2.7/urllib.py", line 86, in urlopen
    return opener.open(url)
  File "/usr/lib/python2.7/urllib.py", line 207, in open
    return getattr(self, name)(url)
  File "/usr/lib/python2.7/urllib.py", line 351, in open_http
    'got a bad status line', None)
IOError: ('http protocol error', 0, 'got a bad status line', None)

I need to fetch a html data for my project. How can I solve this problem? Do you have any ideas ? Thank you.

When I tried urllib2 I got this:

>>> import urllib2
>>> conn = urllib2.urlopen('http://stackoverflow.com/questions/8479736/using-python-urllib-how-to-avoid-non-html-content')

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib/python2.7/urllib2.py", line 126, in urlopen
    return _opener.open(url, data, timeout)
  File "/usr/lib/python2.7/urllib2.py", line 400, in open
    response = self._open(req, data)
  File "/usr/lib/python2.7/urllib2.py", line 418, in _open
    '_open', req)
  File "/usr/lib/python2.7/urllib2.py", line 378, in _call_chain
    result = func(*args)
  File "/usr/lib/python2.7/urllib2.py", line 1207, in http_open
    return self.do_open(httplib.HTTPConnection, req)
  File "/usr/lib/python2.7/urllib2.py", line 1180, in do_open
    r = h.getresponse(buffering=True)
  File "/usr/lib/python2.7/httplib.py", line 1030, in getresponse
    response.begin()
  File "/usr/lib/python2.7/httplib.py", line 407, in begin
    version, status, reason = self._read_status()
  File "/usr/lib/python2.7/httplib.py", line 371, in _read_status
    raise BadStatusLine(line)
httplib.BadStatusLine: ''

Also I tried this:

curl http://stackoverflow.com/questions/8479736/using-python-urllib-how-to-avoid-non-html-content
curl: (52) Empty reply from server

and this:

wget http://stackoverflow.com/questions/8479736/using-python-urllib-how-to-avoid-non-html-content
Connecting to stackoverflow.com (198.252.206.16:80)
wget: error getting response

but they didn't work

at home, I also tried and failed but returns a different error:

conn = urllib2.urlopen('http://stackoverflow.com/questions/8479736/using-python-urllib-how-to-avoid-non-html-content')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib/python2.7/urllib2.py", line 126, in urlopen
    return _opener.open(url, data, timeout)
  File "/usr/lib/python2.7/urllib2.py", line 400, in open
    response = self._open(req, data)
  File "/usr/lib/python2.7/urllib2.py", line 418, in _open
    '_open', req)
  File "/usr/lib/python2.7/urllib2.py", line 378, in _call_chain
    result = func(*args)
  File "/usr/lib/python2.7/urllib2.py", line 1215, in https_open
    return self.do_open(httplib.HTTPSConnection, req)
  File "/usr/lib/python2.7/urllib2.py", line 1177, in do_open
    raise URLError(err)
urllib2.URLError: <urlopen error [Errno -2] Name or service not known>

environment

BBB: Linux beaglebone 3.8.13 #1 SMP Tue Jun 18 02:11:09 EDT 2013 armv7l GNU/Linux
python version: 2.7.3

Is there a reason you're using the deprecated `urllib.urlopen` instead of `urllib2.urlopen`? I doubt that's your problem, but still, it's not a good idea to use deprecated functions unless you have a good rationale. — abarnert, Nov 12 '13 at 06:35
Meanwhile, have you tried reading the same URL from the console with `curl` or `wget` to make sure it works? If so, you may need to go to a lower level in Python (either `httplib`, or just create a `socket` and send a hand-crafted result) and/or put a simple proxy in between to see what your code is doing wrong, what that status line is that it doesn't like, etc. — abarnert, Nov 12 '13 at 06:36
yes you are right. I shouldn't use deprecated module. anyway,I got the error even when I used `urllib2`: I wrote the error above. I tried both `curl` and `wget` but I got an error — SamuraiT, Nov 12 '13 at 06:46
OK, there is clearly something wrong with your system's setup, which has nothing to do with Python, or with programming at all. So you'll need to fix your system. Maybe try [SuperUser](http://superuser.com), or some BeagleBone-specific site. — abarnert, Nov 12 '13 at 07:00

score -1 · Answer 1 · answered Nov 12 '13 at 06:51

-1

I'm really want to recommend you requests lib:

>>> r = requests.get('https://api.github.com/user', auth=('user', 'pass'))
>>> r.status_code
200
>>> r.headers['content-type']
'application/json; charset=utf8'
>>> r.encoding
'utf-8'
>>> r.text
u'{"type":"User"...'

http://www.python-requests.org/en/latest/

How to install:

sudo pip install requests

answered Nov 12 '13 at 06:51

Dmitry Zagorulkin

8,370
4
37
60

Why do you think `requests` will help? He's doing a dead-simple command with no auth or anything else fancy, so it will do the exact same thing as his existing one-liner. And, given that he can't even connect from the command line with `curl` or `wget`, it will have the exact same error. – abarnert Nov 12 '13 at 06:59
Of course, if he has a network connection error than no one lib did't help – Dmitry Zagorulkin Nov 12 '13 at 07:18

urllib error: cannot fetch html data. Python, Beagle Bone Black

1 Answers1