I'm using urllib and urllib2 in Python to open and read webpages but sometimes, the text I get is unreadable. For example, if I run this:
import urllib
text = urllib.urlopen('http://tagger.steve.museum/steve/object/141913').read()
print text
I get some unreadable text. I've read these posts:
Does python urllib2 automatically uncompress gzip data fetched from webpage?
but can't seem to find my answer.
Thank you in advance for your help!
UPDATE: I fixed the problem by 'convincing' the server that my user-agent is a brower and not a crawler.
import urllib
class NewOpener(urllib.FancyURLopener):
version = 'Mozilla/5.0 (X11; Linux i686) AppleWebKit/535.2 (KHTML, like Gecko) Ubuntu/11.10 Chromium/15.0.874.120 Chrome/15.0.874.120 Safari/535.2'
nop = NewOpener()
html_text = nop.open('http://tagger.steve.museum/steve/object/141913').read()
Thank you all for your replies.