3

I tried to download a html file like this:

import urllib

req  = urllib.urlopen("http://www.stream-urls.de/webradio")
html = req.read()

print html

html = html.decode('utf-16')

print html

Since the output after req.read() looks like unicode I tried to convert the response but getting this error:

Traceback (most recent call last):   File
"e:\Documents\Python\main.py", line 8, in <module>
    html = html.decode('utf-16')   
File "E:\Software\Python2.7\lib\encodings\utf_16.py", line 16, in decode
    return codecs.utf_16_decode(input, errors, True) 
UnicodeDecodeError: 'utf16' codec can't decode bytes in position 38-39: illegal UTF-16 surrogate

What do I have to do to get the right encoding?

furas
  • 134,197
  • 12
  • 106
  • 148
avb
  • 1,701
  • 5
  • 22
  • 37
  • Well... could you by the least be kind enough to tell us what the bytes in position 38-39 are??? – barak manos Dec 20 '16 at 12:12
  • BTW, the problem at hand has nothing to do with `urllib`, nor with html. It concerns only to character-encoding issues, so you might want to rephrase and minimize your question to focus on this problem, and this problem only. – barak manos Dec 20 '16 at 12:13
  • 3
    That page returns (gzipped - i.e. not plain text) `charset=utf-8` Why are you decoding w/ utf-16? – Alex K. Dec 20 '16 at 12:14
  • I partially take my second comment back. The specific URL is important in attempting to investigate this issue. – barak manos Dec 20 '16 at 12:29

1 Answers1

3

Use requests and you get correct, ungzipped HTML

import requests

r  = requests.get("http://www.stream-urls.de/webradio")
print r.text

EDIT: how to use gzip and StringIO to ungzip data without saving in file

import urllib
import gzip
import StringIO

req  = urllib.urlopen("http://www.stream-urls.de/webradio")

# create file-like object in memory
buf = StringIO.StringIO(req.read())

# create gzip object using file-like object instead of real file on disk
f = gzip.GzipFile(fileobj=buf)

# get data from file
html = f.read()

print html
furas
  • 134,197
  • 12
  • 106
  • 148
  • `requests` is not a built-in package (at least not in Python 2.x). Can you please indicate how to `pip` it? – barak manos Dec 20 '16 at 12:30
  • BTW: [Does python urllib2 automatically uncompress gzip data fetched from webpage?](http://stackoverflow.com/questions/3947120/does-python-urllib2-automatically-uncompress-gzip-data-fetched-from-webpage) - it shows how to use `gzip` module to ungzip data from server. – furas Dec 20 '16 at 12:33