Edit header in dotx/docx file

Question

I am currently trying to generate a new docx file from an existing template in dotx format. I want to change the firstname, lastname, etc. in the header but I'm not able to access them for some reason... My approach is the following:

 public void generateDocX(Long id) throws IOException, InvalidFormatException {

    //Get user per id
    EmployeeDTO employeeDTO = employeeService.getEmployee(id);

    //Location where the new docx file will be saved
    FileOutputStream outputStream = new FileOutputStream(new File("/home/user/Documents/project/src/main/files/" + employeeDTO.getId() + "header.docx"));

    //Get the template for generating the new docx file
    File template = new File("/home/user/Documents/project/src/main/files/template.dotx");
    OPCPackage pkg = OPCPackage.open(template);
    XWPFDocument document = new XWPFDocument(pkg);

    for (XWPFHeader header : document.getHeaderList()) {
        List<XWPFParagraph> paragraphs = header.getParagraphs();
        System.out.println("Total paragraphs in header are: " + paragraphs.size());
        System.out.println("Total elements in the header are: " + header.getBodyElements().size());
        for (XWPFParagraph paragraph : paragraphs) {
            System.out.println("Paragraph text is: " + paragraph.getText());
            List<XWPFRun> runs = paragraph.getRuns();
            for (XWPFRun run : runs) {
                String runText = run.getText(run.getTextPosition());
                System.out.println("Run text is: " + runText);
            }
        }
    }

    //Write the changes to the new docx file and close the document
    document.write(outputStream);
    document.close();
}

The output in the console is either 1, null or empty string... I've tried several approaches from here, here and here but without any luck...

Here is what's inside the template.dotx

I think that actually that box in which are firstname, lastname and etc. is a frame content or at least thats what Libre Office says, when I click on it.. — IvanNickSim, Mar 25 '20 at 15:05

Axel Richter · Accepted Answer · 2020-03-27T10:19:29.890

IBody.getParagraphs and IBody.getBodyElements- only get the paragraphs or body elements which are directly in that IBody. But your paragraphs are not directly in there but are in a separate text box or text frame. That's why they cannot be got this way.

Since *.docx is a ZIP archive containijg XML files for document, headers and footers, one could get all text runs of one IBody by creating a XmlCursor which selects all w:r XML elements. For a XWPFHeader this could look like so:

 private List<XmlObject> getAllCTRs(XWPFHeader header) {
  CTHdrFtr ctHdrFtr = header._getHdrFtr();
  XmlCursor cursor = ctHdrFtr.newCursor();
  cursor.selectPath("declare namespace w='http://schemas.openxmlformats.org/wordprocessingml/2006/main' .//*/w:r");
  List<XmlObject> ctrInHdrFtr = new ArrayList<XmlObject>();
  while (cursor.hasNextSelection()) {
   cursor.toNextSelection();
   XmlObject obj = cursor.getObject();
   ctrInHdrFtr.add(obj);
  }
  return ctrInHdrFtr;
 }

Now we have a list of all XML elements in that header which are text-run-elements in Word.

We could have a more general getAllCTRs which gets all CTR elements from any kind of IBody like so:

 private List<XmlObject> getAllCTRs(IBody iBody) {
  XmlCursor cursor = null;
  List<XmlObject> ctrInIBody = new ArrayList<XmlObject>();

  if (iBody instanceof XWPFHeaderFooter) {
   XWPFHeaderFooter headerFooter = (XWPFHeaderFooter)iBody;
   CTHdrFtr ctHdrFtr = headerFooter._getHdrFtr();
   cursor = ctHdrFtr.newCursor();
  } else if (iBody instanceof XWPFDocument) {
   XWPFDocument document = (XWPFDocument)iBody;
   CTDocument1 ctDocument1 = document.getDocument();
   cursor = ctDocument1.newCursor();
  } else if (iBody instanceof XWPFAbstractFootnoteEndnote) {
   XWPFAbstractFootnoteEndnote footEndnote = (XWPFAbstractFootnoteEndnote)iBody;
   CTFtnEdn ctFtnEdn = footEndnote.getCTFtnEdn();
   cursor = ctFtnEdn.newCursor();
  }

  if (cursor != null) {
   cursor.selectPath("declare namespace w='http://schemas.openxmlformats.org/wordprocessingml/2006/main' .//*/w:r");
   while (cursor.hasNextSelection()) {
    cursor.toNextSelection();
    XmlObject obj = cursor.getObject();
    ctrInIBody.add(obj);
   }
  }
  return ctrInIBody ;
 }

Now we have a list of all XML elements in that IBody which are text-run-elements in Word.

Having that we can get the text out of them like so:

 private void printAllTextInTextRunsOfIBody(IBody iBody) throws Exception {
  List<XmlObject> ctrInIBody = getAllCTRs(iBody);
  for (XmlObject obj : ctrInIBody) {
   CTR ctr = CTR.Factory.parse(obj.xmlText());
   for (CTText ctText : ctr.getTList()) {
    String text = ctText.getStringValue();
    System.out.println(text);
   }
  }
 }

This probably shows the next challenge. Because Word is very messy in creating text-run-elements. For example your placeholder <<Firstname>> can be split into text-runs << + Firstname + >>. The reason migt be different formatting or spell checking or something else. Even this is possible: << + Lastname + >>; << + YearOfBirth + >>. Or even this: <<Firstname + >> << + Lastname>>; << + YearOfBirth>>. You see, replacing the placeholders with text is nearly impossible because the placeholders may be split into multiple tex-runs.

To avoid this the template.dotx needs to be created from users who know what they are doing.

At first turn spell check off. Grammar check as well. If not, all found possible spell errors or grammar violations are in separate text-runs to mark them accordingly.

Second make sure the whole placeholder is eaqual formatted. Different formatted text also must be in separate text-runs.

I am really skeptic that this will work properly. But try it yourself.

Complete example:

import java.io.File;
import java.io.FileOutputStream;
import java.io.FileInputStream;

import org.apache.poi.xwpf.usermodel.*;
import org.openxmlformats.schemas.wordprocessingml.x2006.main.*;

import org.apache.xmlbeans.XmlObject;
import org.apache.xmlbeans.XmlCursor;

import java.util.List;
import java.util.ArrayList;

public class WordEditAllIBodys {

 private List<XmlObject> getAllCTRs(IBody iBody) {
  XmlCursor cursor = null;
  List<XmlObject> ctrInIBody = new ArrayList<XmlObject>();

  if (iBody instanceof XWPFHeaderFooter) {
   XWPFHeaderFooter headerFooter = (XWPFHeaderFooter)iBody;
   CTHdrFtr ctHdrFtr = headerFooter._getHdrFtr();
   cursor = ctHdrFtr.newCursor();
  } else if (iBody instanceof XWPFDocument) {
   XWPFDocument document = (XWPFDocument)iBody;
   CTDocument1 ctDocument1 = document.getDocument();
   cursor = ctDocument1.newCursor();
  } else if (iBody instanceof XWPFAbstractFootnoteEndnote) {
   XWPFAbstractFootnoteEndnote footEndnote = (XWPFAbstractFootnoteEndnote)iBody;
   CTFtnEdn ctFtnEdn = footEndnote.getCTFtnEdn();
   cursor = ctFtnEdn.newCursor();
  }

  if (cursor != null) {
   cursor.selectPath("declare namespace w='http://schemas.openxmlformats.org/wordprocessingml/2006/main' .//*/w:r");
   while (cursor.hasNextSelection()) {
    cursor.toNextSelection();
    XmlObject obj = cursor.getObject();
    ctrInIBody.add(obj);
   }
  }
  return ctrInIBody ;
 }

 private void printAllTextInTextRunsOfIBody(IBody iBody) throws Exception {
  List<XmlObject> ctrInIBody = getAllCTRs(iBody);
  for (XmlObject obj : ctrInIBody) {
   CTR ctr = CTR.Factory.parse(obj.xmlText());
   for (CTText ctText : ctr.getTList()) {
    String text = ctText.getStringValue();
    System.out.println(text);
   }
  }
 }

 private void replaceTextInTextRunsOfIBody(IBody iBody, String placeHolder, String textValue) throws Exception {
  List<XmlObject> ctrInIBody = getAllCTRs(iBody);
  for (XmlObject obj : ctrInIBody) {
   CTR ctr = CTR.Factory.parse(obj.xmlText());
   for (CTText ctText : ctr.getTList()) {
    String text = ctText.getStringValue();
    if (text != null && text.contains(placeHolder)) {
     text = text.replace(placeHolder, textValue);
     ctText.setStringValue(text);
     obj.set(ctr);
    }
   }
  }
 }

 public void generateDocX() throws Exception {

  FileOutputStream outputStream = new FileOutputStream(new File("./" + 1234 + "header.docx"));

  //Get the template for generating the new docx file
  File template = new File("./template.dotx");
  XWPFDocument document = new XWPFDocument(new FileInputStream(template));

  //traverse all headers
  for (XWPFHeader header : document.getHeaderList()) {
   printAllTextInTextRunsOfIBody(header);

   replaceTextInTextRunsOfIBody(header, "<<Firstname>>", "Axel");
   replaceTextInTextRunsOfIBody(header, "<<Lastname>>", "Richter");
   replaceTextInTextRunsOfIBody(header, "<<ProfessionalTitle>>", "Skeptic");
  }  

  //traverse all footers
  for (XWPFFooter footer : document.getFooterList()) {
   printAllTextInTextRunsOfIBody(footer);

   replaceTextInTextRunsOfIBody(footer, "<<Firstname>>", "Axel");
   replaceTextInTextRunsOfIBody(footer, "<<Lastname>>", "Richter");
   replaceTextInTextRunsOfIBody(footer, "<<ProfessionalTitle>>", "Skeptic");
  }  

  //traverse document body; note: tables needs not be traversed separately because they are in document body
  printAllTextInTextRunsOfIBody(document);

  replaceTextInTextRunsOfIBody(document, "<<Firstname>>", "Axel");
  replaceTextInTextRunsOfIBody(document, "<<Lastname>>", "Richter");
  replaceTextInTextRunsOfIBody(document, "<<ProfessionalTitle>>", "Skeptic");

  //traverse all footnotes
  for (XWPFFootnote footnote : document.getFootnotes()) {
   printAllTextInTextRunsOfIBody(footnote);

   replaceTextInTextRunsOfIBody(footnote, "<<Firstname>>", "Axel");
   replaceTextInTextRunsOfIBody(footnote, "<<Lastname>>", "Richter");
   replaceTextInTextRunsOfIBody(footnote, "<<ProfessionalTitle>>", "Skeptic");
  }  

  //traverse all endnotes
  for (XWPFEndnote endnote : document.getEndnotes()) {
   printAllTextInTextRunsOfIBody(endnote);

   replaceTextInTextRunsOfIBody(endnote, "<<Firstname>>", "Axel");
   replaceTextInTextRunsOfIBody(endnote, "<<Lastname>>", "Richter");
   replaceTextInTextRunsOfIBody(endnote, "<<ProfessionalTitle>>", "Skeptic");
  }  


  //since document was opened from *.dotx the content type needs to be changed
  document.getPackage().replaceContentType(
   "application/vnd.openxmlformats-officedocument.wordprocessingml.template.main+xml",
   "application/vnd.openxmlformats-officedocument.wordprocessingml.document.main+xml");

  //Write the changes to the new docx file and close the document
  document.write(outputStream);
  outputStream.close();
  document.close();
 }

 public static void main(String[] args) throws Exception {
  WordEditAllIBodys app = new WordEditAllIBodys();
  app.generateDocX();
 }
}

Btw.: Since your document was opened from *.dotx the content type needs to be changed from wordprocessingml.template to wordprocessingml.document. Else Word will not open the resulting *.docx document. See Converting a file with ".dotx" extension (template) to "docx" (Word File).

As I am skeptic about the replacing-placeholder-text-approach, my preferred way is filling forms. See Problem with processing word document java. Of course such form fields cannot be used in header or footer. So headers or footers schould be created form scratch at whole.

Thanks for your response! I've tried the code that you provided but for some reason nothing is happening ... Actually, the method that is supposed to print the texts in the runs of header (printAllTextInTextRunsOfHeader) is not printing anything at all. I figured out that the second for-loop is actually never executed.. Any further suggestions or you think that it will be better to recreate the template myself (I'm not sure that I'll make it any better but I can try..) ? — IvanNickSim, Mar 26 '20 at 13:07
@IvanNickSim: Maybe the text box is not even in the header but is in document and only placed over the header? See supplements in my answer. — Axel Richter, Mar 26 '20 at 13:36
I'll try your changes in a minute.. I've checked what is inside the docx file and I updated my post with a picture what is inside.. — IvanNickSim, Mar 26 '20 at 13:46
@Alex Thank you very much, mate! After the changes that you've made, now the firstname, lastname and the professional title are successfully changed! I suppose that I will have to do the same for the footer and the other headers as well since they're also different documents.. Am I right about that? — IvanNickSim, Mar 26 '20 at 13:54
@IvanNickSim: Yes you are correct. As you already had unzipped the `*.dotx` you had seen the different document parts. Following parts contain text-runs: `document.xml` is the document body, `headerN.xml` and `footerN.xml` are the different headers and footers (default, even, first), `endnotes.xml` contains endnote text, `footnotes.xml` contains footnote text. — Axel Richter, Mar 26 '20 at 14:07
I've just figured out that this code snippet is ignoring the footers... I've tried to access them again in my old way using XWPFHeaderFooterPolicy but it displays null again.. As far as I can understand you are setting the cursor.selectPath to take all from the main and to be able to write and read, so it should include them but apparently not.. — IvanNickSim, Mar 27 '20 at 08:10
@IvanNickSim: I have updated my answer to provide a more general example which considers all possible `Word` document parts. — Axel Richter, Mar 27 '20 at 10:21
Thanks mate, works like a charm at the moment! The only problem is like you said that the placeholders are not properly created and in the footer it changes for the <> but for <>, I've had to change the required placeholder to be Lastname only, instead of <>.. Anyway, great work and thank you very much! — IvanNickSim, Mar 27 '20 at 11:20
Mate, how can I make it possible to replace only the first met placeholder? For example, I have many projects for one employee but the placeholder is always <>. When I use foreach to replace, It replaces all placeholders with the last object in the cycle, any suggestions? — IvanNickSim, Apr 06 '20 at 15:01
@IvanNickSim: Have a counter `int numberOfPlaceholder` in `replaceTextInTextRunsOfIBody` which gets incremented in `if (text != null && text.contains(placeHolder)) { numberOfPlaceholder++; ...`. Then do `text = text.replace(placeHolder, textValue); ctText.setStringValue(text); obj.set(ctr);` only if the counter has a special value? — Axel Richter, Apr 06 '20 at 15:20

Edit header in dotx/docx file

1 Answers1