2

I am trying to make an in-memory zip file, which contains a bunch of JSON files. I am struggling to upload it to S3 as a file object, receiving a rather strange error. Here is my code:

import boto3
import zipfile
import json
import os

session = boto3.session.Session(
        aws_access_key_id=os.environ.get('AWS_ACCESS_KEY_ID'),
        aws_secret_access_key=os.environ.get('AWS_SECRET_ACCESS_KEY'))
client = session.client('s3')

data = {'test1.json': {'a': 1, 'b': 2},
        'test2.json': {'x': 3, 'y': 4}}

zip_buffer = BytesIO()
zf = zipfile.ZipFile(zip_buffer, 'w')
for filename, d in data.iteritems():
    zf.writestr(filename, json.dumps(d, indent=4))

client.upload_fileobj(zf, os.environ.get('S3_BUCKET'), 'test_zip.zip')

This gives me:

KeyError: 'There is no item named 8388608 in the archive'

How and why is this happening? Of course there is no item 8388608 in the archive - I haven't put it there.


EDIT

If I save the file locally instead of in-memory and then re-open it, it works fine. Should I be using tempfile perhaps?

turnip
  • 2,246
  • 5
  • 30
  • 58
  • in-memory processing is always confusing. check this out. https://stackoverflow.com/questions/3610221/how-to-create-an-in-memory-zip-file-with-directories-without-touching-the-disk – mootmoot Mar 23 '18 at 15:42
  • @mootmoot I am already creating the in-memory zip fine, I think this may be more of a `boto3` issue. – turnip Mar 23 '18 at 15:44
  • This is not boto3 issue, you need to pass the correct file object, it is definitely not the `zf`, but the buffer object. That's why I say in-memory processing is confusing. – mootmoot Mar 23 '18 at 15:46
  • I found what's going on. I had already tried passing `zip_buffer`, and it worked, but it resulted in invalid zip files (Windows / 3rd party tools couldn't open them). The issues was that I was not closing `zf` before uploading them. Interesting artifact... – turnip Mar 23 '18 at 15:51
  • Well, you can post your solution, which in fact, there solution is there. – mootmoot Mar 23 '18 at 15:55

1 Answers1

12

The issue was a rather weird one. Firstly, it is the zip_buffer that needs to be passed, not zf. But, you need to make sure to close the zipfile object first, otherwise this will result in corrupted zip files that cannot be opened.

import boto3
import zipfile
import json
import os

session = boto3.session.Session(
        aws_access_key_id=os.environ.get('AWS_ACCESS_KEY_ID'),
        aws_secret_access_key=os.environ.get('AWS_SECRET_ACCESS_KEY'))
client = session.client('s3')

data = {'test1.json': {'a': 1, 'b': 2},
        'test2.json': {'x': 3, 'y': 4}}

zip_buffer = BytesIO()
zf = zipfile.ZipFile(zip_buffer, 'w')
for filename, d in data.iteritems():
    zf.writestr(filename, json.dumps(d, indent=4))

zf.close()  # important!
zip_buffer.seek(0)

client.upload_fileobj(zip_buffer, os.environ.get('S3_BUCKET'), 'test_zip.zip')
turnip
  • 2,246
  • 5
  • 30
  • 58