0

I'm using python-docx and python-docx-template to generate a multi-page document. Word on both MacOS and Windows complains about an error in the .docx file produced, yet Word is able to open the file if allowed to continue and the document looks fine when opened. (On MacOS, the error dialog reads "HRESULT 0x80004005 Location: Part: /word/document.xml, line 0, column 0").

The .docx template is a pretty simple one-page document. The loop to construct the compound document is based on an answer to another question and is this simple Python code:

overall_doc = Document()
num_pages = len(records_list)
for index, record in enumerate(records_list):
    page = DocxTemplate(template)
    values = vars(records_list[index])
    page.render(values)
    if index < (num_pages - 1):
        page.add_page_break()
    for element in page.element.body:
        overall_doc.element.body.append(element)
overall_doc.save('outputfile.docx')

The values substituted into the template are UTF-8 strings with no special characters (and in particular, no ampersands or greater than/less than characters). I've verified the problem is not due to the string values being substituted into the template.

If you break the loop after the first page is created, no error results. If the loop is allowed to create even just 2 pages, the error in Word occurs. If I remove the page break code altogether, the error still occurs. If I add an extra page break at the end, the error still occurs.

I've tried to find a docx validation tool. The only thing I have been able to run is docx4j's OpenMainDocumentAndTraverse function, which as far as I can tell, should report errors. But docx4j does not report any error with the output document.

What could cause this error? If my mistake is not obvious, how can I diagnose the reason that Word is complaining?

mhucka
  • 2,143
  • 26
  • 41
  • The Open XML SDK Productivity Tool will let you inspect the XML of the zip package. I don't know the language you're using, but it looks to me as if your loop may be inserting body elements? If yes, that's the problem - there can be only one body element in a document. – Cindy Meister Aug 06 '18 at 20:07
  • @CindyMeister Thanks. The language is Python (the question was tagged with Python, but I'll edit the question to mention it in the text). I'm afraid I'm not familiar with compiling on Windows and can't build the Open XML SDK Productivity Tool – is there a ready-to-run version somewhere? Finally, the body elements are being appended to; the code does not add a new body. – mhucka Aug 06 '18 at 20:25
  • Sorry about the misunderstading: I meant I don't know Python(-docx) so I'm not sure what I'm seeing when I look at your code. The Productivity Tool can be downloaded from the MS site, look for the SDK version 2.5 - that's already built. You can compare the code's result with the version Word repaired to see how they differ. – Cindy Meister Aug 06 '18 at 20:34
  • (Slapping head) I can't believe I didn't think to compare the results to the repaired file. Thanks, I'll do that. – mhucka Aug 06 '18 at 20:35
  • opc-diag is a pure-Python (runs on anything) Open XML package inspector: https://opc-diag.readthedocs.io/en/latest/. It's what we use on the python-docx and python-pptx projects to investigate .docx and .pptx files and has some comparison features. – scanny Aug 07 '18 at 20:08

0 Answers0