How can I create a GzipFile instance from the “file-like object” that urllib.urlopen() returns?

Question

I’m playing around with the Stack Overflow API using Python. I’m trying to decode the gzipped responses that the API gives.

import urllib, gzip

url = urllib.urlopen('http://api.stackoverflow.com/1.0/badges/name')
gzip.GzipFile(fileobj=url).read()

According to the urllib2 documentation, urlopen “returns a file-like object”.

However, when I run read() on the GzipFile object I’ve created using it, I get this error:

AttributeError: addinfourl instance has no attribute 'tell'

As far as I can tell, this is coming from the object returned by urlopen.

It doesn’t appear to have seek either, as I get an error when I do this:

url.read()
url.seek(0)

What exactly is this object, and how do I create a functioning GzipFile instance from it?

`Content-Encoding: gzip` should be handled by the http library, but unfortunately it isn't. This is [issue 9500](http://bugs.python.org/issue9500) in Python's bug database, for the interested. — Magnus Hoff, Nov 17 '10 at 14:09
@Magnus: cheers, good to know it’s at least in the bug tracker. — Paul D. Waite, Nov 17 '10 at 14:28

score 10 · Accepted Answer · edited Oct 28 '15 at 18:26

10

The urlopen docs list the supported methods of the object that is returned. I recommend wrapping the object in another class that supports the methods that gzip expects.

Other option: call the read method of the response object and put the result in a StringIO object (which should support all methods that gzip expects). This maybe a little more expensive though.

E.g.

import gzip
import json
import StringIO
import urllib

url = urllib.urlopen('http://api.stackoverflow.com/1.0/badges/name')
url_f = StringIO.StringIO(url.read())
g = gzip.GzipFile(fileobj=url_f)
j = json.load(g)

edited Oct 28 '15 at 18:26

hd1

33,938
5
80
91

answered Nov 17 '10 at 13:14

stefanw

10,456
3
36
34

Wrapping it in a `StringIO` object gets past that error, but I still get an `IOError: Not a gzipped file` – Thomas K Nov 17 '10 at 13:16
1

@ThomasK It works find for me. Are you passing `url.read()` to the `StringIO` constructor or just `url`? The latter fails. – aaronasterling Nov 17 '10 at 13:21
Excellent, cheers. Unutbu’s answer was great too, but I’ll go with this one as I’m guessing the `StringIO` solution is more backwards compatible. – Paul D. Waite Nov 17 '10 at 14:49
2

Is there a way to do this without reading the entire `urlopen` response in one go? I'm looking to use something like this in a situation where the payload of the `urlopen` is very large (GBs), so I would like to be able to use this to stream-parse as data comes in, rather than blocking on the whole http request. – Kevin Oct 19 '15 at 15:21

unutbu · Answer 2 · 2010-11-17T13:40:31.237

8

import urllib2
import json
import gzip
import io

url='http://api.stackoverflow.com/1.0/badges/name'
page=urllib2.urlopen(url)
gzip_filehandle=gzip.GzipFile(fileobj=io.BytesIO(page.read()))
json_data=json.loads(gzip_filehandle.read())
print(json_data)

io.BytesIO is for Python2.6+. For older versions of Python, you could use cStringIO.StringIO.

edited Nov 17 '10 at 13:40

answered Nov 17 '10 at 13:19

unutbu

842,883
184
1,785
1,677

score 0 · Answer 3 · answered Sep 05 '19 at 09:06

Here is a new update for @stefanw's answer, to whom that might think it too expensive to use that much memory.

Thanks to this article(https://www.enricozini.org/blog/2011/cazzeggio/python-gzip/, it explains why gzip doesn't work), the solution is to use Python3.

import urllib.request
import gzip

response = urllib.request.urlopen('http://api.stackoverflow.com/1.0/badges/name')
with gzip.GzipFile(fileobj=response) as f:
    for line in f:
        print(line)

How can I create a GzipFile instance from the “file-like object” that urllib.urlopen() returns?

3 Answers3

Linked