I'm using python-docx
and python-docx-template
to generate a multi-page document. Word on both MacOS and Windows complains about an error in the .docx
file produced, yet Word is able to open the file if allowed to continue and the document looks fine when opened. (On MacOS, the error dialog reads "HRESULT 0x80004005 Location: Part: /word/document.xml, line 0, column 0").
The .docx
template is a pretty simple one-page document. The loop to construct the compound document is based on an answer to another question and is this simple Python code:
overall_doc = Document()
num_pages = len(records_list)
for index, record in enumerate(records_list):
page = DocxTemplate(template)
values = vars(records_list[index])
page.render(values)
if index < (num_pages - 1):
page.add_page_break()
for element in page.element.body:
overall_doc.element.body.append(element)
overall_doc.save('outputfile.docx')
The values substituted into the template are UTF-8 strings with no special characters (and in particular, no ampersands or greater than/less than characters). I've verified the problem is not due to the string values being substituted into the template.
If you break the loop after the first page is created, no error results. If the loop is allowed to create even just 2 pages, the error in Word occurs. If I remove the page break code altogether, the error still occurs. If I add an extra page break at the end, the error still occurs.
I've tried to find a docx validation tool. The only thing I have been able to run is docx4j
's OpenMainDocumentAndTraverse
function, which as far as I can tell, should report errors. But docx4j does not report any error with the output document.
What could cause this error? If my mistake is not obvious, how can I diagnose the reason that Word is complaining?