2

I'm writing an Azure function in Python 3.9 that needs to accept a base64 string created from a known .docx file which will serve as a template. My code will decode the base64, pass it to a BytesIO instance, and pass that to docx.Document(). However, I'm receiving an exception BadZipFile: File is not a zip file.

Below is a slimmed down version of my code. It fails on document = Document(bytesIODoc). I'm beginning to think it's an encoding/decoding issue, but I don't know nearly enough about it to get to the solution.

from docx import Document
from io import BytesIO
import base64

var = {
    'template': 'Some_base64_from_docx_file',
    'data': {'some': 'data'}
}

run_stuff = ParseBody(body=var)
output = run_stuff.run()

class ParseBody():
    def __init__(self, body):
        self.template = str(body['template'])
        self.contents = body['data']

    def _decode_template(self):
        b64Doc = base64.b64decode(self.template)
        bytesIODoc = BytesIO(b64Doc)
        document = Document(bytesIODoc)

    def run(self):
        self.document = self._decode_template()

I've also tried the following change to _decode_template and am getting the same exception. This is running base64.decodebytes() on the b64Doc object and passing that to BytesIO instead of directly passing b64Doc.

def _decode_template(self):
    b64Doc = base64.b64decode(self.template)
    bytesDoc = base64.decodebytes(b64Doc)
    bytesIODoc = BytesIO(bytesDoc)

I have successfully tried the following on the same exact .docx file to be sure that this is possible. I can open the document in Python, base64 encode it, decode into bytes, pass that to a BytesIO instance, and pass that to docx.Document successfully.

file = r'WordTemplate.docx'

doc = open(file, 'rb').read()
b64Doc = base64.b64encode(doc)
bytesDoc = base64.decodebytes(b64Doc)

bytesIODoc= BytesIO(bytesDoc)

newDoc = Document(bytesIODoc)

I've tried countless other solutions to no avail that have lead me further away from a resolution. This is the closest I've gotten. Any help is greatly appreciated!

1 Answers1

0

The answer to the question linked below actually helped me resolve my own issue. How to generate a DOCX in Python and save it in memory?

All I had to do was change document = Document(bytesIODoc) to the following:

document = Document()
document.save(bytesIODoc)