1

I am trying to use boto to open a .zip file I have in s3. I am trying to work with the data directly, I want to avoid creating temporary files.

In [201]: import StringIO

In [202]: import boto

In [203]: conn = boto.connect_s3()

In [204]: my_bucket = conn.get_bucket('my_bucket')

In [205]: my_list = [ele for ele in my_bucket.list('my_file.zip')]

In [206]: f = StringIO.StringIO()

In [207]: my_list[0].get_file(f)

In [208]: f.seek(0)

If the file was not zipped I would just use:

my_content = my_list[0].get_contents_as_string()

but since it is zipped, I am getting garbage.

An answer to this question does what I want (I borrowed a bit of my attempt from it) using gzip, but I can't find anything using for zip. I tried using zipfileZipFile, but read, extract and extractall methods don't seem to do what I want.

Community
  • 1
  • 1
Akavall
  • 82,592
  • 51
  • 207
  • 251

1 Answers1

4

You should look into the python module gzip :

https://docs.python.org/2/library/gzip.html

you should be able to stringIO with gzip. .

from boto.s3.connection import S3Connection
import gzip
from StringIO import StringIO

S3Conn = S3Connection() # assuming your .boto has been setup
Bucket = S3Conn.get_bucket('my_bucket')
my_list = [gzip.GzipFile(fileobj=(StringIO(ele.get_contents_as_string()))) for ele in Bucket.list()]
#for readability I pulled this out
for item in my_list:
    item.read()

for readability the list comprehension should probably be broken up - but I followed your original posting to compare.

Good luck!

anmol
  • 751
  • 6
  • 7
cgseller
  • 3,875
  • 2
  • 19
  • 21