1

Need to identify when a page is ending in a word document and mark it with additional text like PAGEEND_<>.

I am able to iterate over paragraphs using the following code:

from docx.api import Document    
from docx.enum.text import WD_BREAK

inputfile = 'test.docx'
document = Document(inputfile)
for paragraph in document.paragraphs:
    # Write paragraph text into new document
    # Write additional text as PARAEND_<<ParaNumber>>

How do I do the same thing for every page?

Bonson
  • 1,418
  • 4
  • 18
  • 38

1 Answers1

2

The short answer is that it's not possible to do reliably within python-docx because determining page boundaries is a function of the (runtime) page rendering engine and is not represented in the .docx file itself.

There are more details in the answer to this question:
Page number python-docx

and this one:
How to identify page breaks using python-docx from docx

scanny
  • 26,423
  • 5
  • 54
  • 80
  • thank you for your response. You are correct. We are not able to reliably extract the page endings. Still searching for possible approaches. – Bonson Apr 04 '18 at 11:26
  • your answer at the following link https://stackoverflow.com/questions/24210216/how-to-identify-page-breaks-using-python-docx-from-docx mentions a method using version 2.7. Would you have a solution for python 3.0? – Bonson Apr 04 '18 at 11:29