I am trying to merge a number of docx documents using docxcompose. These documents consist typically of a table, but in some cases multiple tables, and occasionally pictures. I am using the basic script offered by Shashank (combine word document using python docx):
from docxcompose.composer import Composer
from docx import Document as Document_compose
#filename_master is name of the file you want to merge the docx file into
master = Document_compose(filename_master)
composer = Composer(master)
#filename_second_docx is the name of the second docx file
doc2 = Document_compose(filename_second_docx)
#append the doc2 into the master using composer.append function
composer.append(doc2)
#Save the combined docx with a name
composer.save("combined.docx")
It works beautifully for the most part and even works fine for images, but when I append the final document, the header for the table at the last doc appears at the bottom of the same page as the document before it. I need each document, when appended, to be on a clean new page.
I have tried adding page breaks as Akash suggested in the comments of that post in adding some here:
master = Document_compose(filename_master)
master.add_page_break()
composer.append(doc2)
master.add_page_break()
composer.append(doc3)
master.add_page_break()
This does solve the initial problem but creates a new one of producing unnecessary page breaks earlier on in the merged document.
I've also tried iterating through the documents beforehand, adding page breaks to each one, and then composing using the amended docs. Again, this does the job but produces unnecessary page breaks.
It seems I need a way to add page breaks conditionally, or to delete empty pages (for which I could not find a solution), but my limited understanding of XML is that page breaks are not always encoded within the XML but can be produced by the renderer.
Advice is appreciated