0

For some reason urllib is corrupting documents that i download from a website. The url is definitely correct. The documents download with their correct names but they appear corrupted once opened. I found that the download is encoded with deflate so i attempted to decode it but i keep getting either these two errors when i modify the "zlib" part.

errors:

error-2 while preparing to decompress data: inconsistent stream state

or

error -3 while decompressing data: incorrect header check

here is a code snippet of where the problem is:

def download_file(url, name):
try:
f = urllib.urlopen(url)
fh = open(name, 'wb')
if f.info().get('content-encoding') == 'deflate':
 fh = zlib.decompress(f.read(),16 +zlib.MAX_WBITS)
#fh.write(f.read())
fh.close()
print "  File Downloaded : " , name

except Exception:
    raise
Thomas
  • 1,199
  • 3
  • 13
  • 25
  • `gzip` and `deflate` are treated differently when decompressing. – Martijn Pieters Jun 25 '15 at 09:51
  • 1
    See [Python: Inflate and Deflate implementations](http://stackoverflow.com/q/1089662) for deflate code. – Martijn Pieters Jun 25 '15 at 09:52
  • I've said this in a previous comment but I'll reiterate it here: use the [`requests` library](http://docs.python-requests.org/en/latest/) instead, it handles decompression transparently for you. – Martijn Pieters Jun 25 '15 at 09:53
  • How comes your code snippet has an `except` without a `try`? – tobias_k Jun 25 '15 at 09:55
  • could you give me an example with my code above using the requests library to decode deflate? – Thomas Jun 25 '15 at 09:57
  • @tobias_k sorry i missed that bit out, i added it now. – Thomas Jun 25 '15 at 10:00
  • @MartijnPieters could you give me an example with my code above using the requests library to decode deflate? – Thomas Jun 25 '15 at 10:30
  • @RitchieRamnial: you don't need to do anything; accessing the response body via `.content` or `.text` gives you the decompressed content. If you want to download to a file object (streaming) and want to use the raw socket, you can set a flag to have it decompress as you read as well. See [How to download image using requests](http://stackoverflow.com/q/13137817) – Martijn Pieters Jun 25 '15 at 10:33
  • @MartijnPieters i still cant get it to work. the aim is to download the file from the link. the file that downloads is still encoded and unreadable. please could you dumb it down for me and show me an example with my code? – Thomas Jun 25 '15 at 11:05
  • @RitchieRamnial: the code in the other answer *is* the sample I'd give. If that doesn't produce non-corrupted results, the server is not giving you the headers the client needs to decode the information. You'd have to manually deflate / unzip, for which there are already answers here on SO that tell you how. – Martijn Pieters Jun 25 '15 at 11:09
  • @MartijnPieters could you link me to the manual process that you were talking about? – Thomas Jun 25 '15 at 14:39
  • @RitchieRamnial: I meant explicitly in your own code decompress the content. You already found the gzip process, and in my second comment I linked to the deflate procedure. – Martijn Pieters Jun 25 '15 at 14:40
  • @MartijnPieters right, ive had a good look at the manual deflate code but i still cant incorporate it into my download code. i would really really appreciate it if you could give me a working example!" – Thomas Jun 25 '15 at 14:59
  • @RitchieRamnial: see the [`urllib3` source code](https://github.com/kennethreitz/requests/blob/master/requests/packages/urllib3/response.py#L18-L44) then. – Martijn Pieters Jun 25 '15 at 15:02

0 Answers0