Split a zip-file into chunks with Python

Question

I have a piece of a code which creates a zip file successfully, but I need to split this file if the size of it is more that 1MB.

I have this code but it doesn't work:

    from split_file_reader.split_file_writer import SplitFileWriter
    import zipfile

    # make element tree
    tree = etree.ElementTree(batch_element)

    # make xml file and write it in stream
    xml_object = BytesIO()
    tree.write(xml_object, pretty_print=True, xml_declaration=False, encoding="utf-8")
    xml_file = xml_object.getvalue()

    final = BytesIO()

    with SplitFileWriter(final, 1_000_000) as sfw:
        with zipfile.ZipFile(sfw, "a") as zip_file:
            zip_file.writestr('Batch.xml', xml_file)

I want to retrieve the split file as bytes. The zipping part is working, but the splitting doesn't.

You seem to be using https://pypi.org/project/split-file-reader/; probably [edit] to include this information in your question. — tripleee, Aug 30 '21 at 11:47

score 1 · Answer 1 · answered Aug 30 '21 at 12:02

1

Read the docs for the module which you are using, which is https://pypi.org/project/split-file-reader

It should have the usage instructions in there.

EDIT: This is an example:

with SplitFileWriter("split.zip.", 500_000) as sfw:
    with zipfile.ZipFile(file=sfw, mode='w') as zipf:
        for root, dirs, files in os.walk("./"):
            for file in files:
                if file.startswith("random_payload"):
                    zipf.write(os.path.join(root, file))

answered Aug 30 '21 at 12:02

Psuedodoro

177
2
14

i have seen the docs but there isn't much. since it new library. in the example here is simple version where it saves the files in some location, but i want to retrieve it as byte variable – Giorgi Injgia Aug 30 '21 at 12:06

ekhumoro · Accepted Answer · 2022-08-30T12:32:46.600

According to the split_file_reader docs, the first argument of SplitFileWriter can be a generator that produces file-like objects. That will allow you to split the zip-file into a list of BytesIO chunks.

Here is a working example script:

import zipfile
from io import BytesIO
from lxml import etree
from split_file_reader.split_file_writer import SplitFileWriter

# make element tree
# tree = etree.ElementTree(batch_element)
tree = etree.parse('/tmp/test.xml')

# make xml file and write it in stream
xml_object = BytesIO()
tree.write(xml_object, pretty_print=True, xml_declaration=False, encoding="utf-8")
xml_file = xml_object.getvalue()

chunks = []

def gen(lst):
    while True:
        lst.append(BytesIO())
        yield lst[-1]

with SplitFileWriter(gen(chunks), 1_000_000) as sfw:
    with zipfile.ZipFile(sfw, "w") as zip_file:
        zip_file.writestr('Batch.xml', xml_file)

for i, chunk in enumerate(chunks):
    print(f'chunk {i}: {len(chunk.getvalue())}')

Output:

chunk 0: 1000000
chunk 1: 1000000
chunk 2: 1000000
chunk 3: 1000000
chunk 4: 1000000
chunk 5: 887260

[Here is a corrected link to the docs](https://gitlab.com/Reivax/split_file_reader/-/tree/master/src/split_file_writer#arguments) The package structure has changed. I am the author of this project, and this is an acceptable answer. You may also yield the same BytesIO object forever, and simply truncate it each time, doing your post-processing in the generator immediately as the BytesIO fills, instead of after the full file is written, thus saving memory. — Reivax, Aug 30 '22 at 00:31

Split a zip-file into chunks with Python

2 Answers2