7

I am trying to locate page breaks in a .docx document using Apache POI. I am doing that in order to be able to find the page number of a paragraph. The code that I am using is :

for (XWPFRun run : paragraph.getRuns()) {
            List<CTBr> brList = run.getCTR().getBrList();
            if (brList != null && !brList.isEmpty()) {
                for (CTBr br : brList) {
                    if (br.getType() == STBrType.PAGE) {
                        //page break detected
                    }
                }
            } else {
                List<CTEmpty> lastRenderedPageBreakList = run.getCTR().getLastRenderedPageBreakList();
                if (lastRenderedPageBreakList != null) {
                    for (CTEmpty lastRenderedPageBreak : lastRenderedPageBreakList) {
                        //page break detected
                    }
                }
            }
        }

The code works fine for most of the pages but not for all of them. Does anyone have any idea of what I am still missing?

Markos Fragkakis
  • 7,499
  • 18
  • 65
  • 103
dpalaka
  • 71
  • 3
  • Are you aware that Word is not a page-based format? So unlike PDF, the format doesn't explicitly break on pages – Gagravarr Jun 20 '14 at 13:11
  • Word somehow knows how to render the document and change pages, I am trying to figure out this mechanism. I suppose that there are elements in the document indicating page breaks, aren't there? The code above detects page breaks but not all of them. – dpalaka Jun 20 '14 at 20:08
  • 2
    Word has a full blown rendering engine that it uses. Sometimes, but not always, Word will record where it last broke the page in the file format. It won't always, and it won't in all cases, and it's just a hint. The only way to know for sure is to render it, fonts / page sizes / margins / text / images and all, which POI doesn't support – Gagravarr Jun 20 '14 at 20:19
  • I have the same issue, please share the solution if you found any – WiredCoder Jul 03 '16 at 13:59
  • Possible duplicate of [Why only some page numbers stored in XML of docx file?](https://stackoverflow.com/questions/48680399/why-only-some-page-numbers-stored-in-xml-of-docx-file) – Cindy Meister Feb 26 '18 at 09:41

0 Answers0