4

I have a file, "template.docx" that I would like to have placeholders (ie. [serial number]) that can be replaced with a string or maybe a table. I am using Apache POI and no i cannot use docx4j.

Is there a way to have the program iterate over all occurrences of "[serial number]" and replace them with a string? Many of these tags will be inside a large table so is there some equivalent command with the Apache POI to just pressing ctrl+f in word and using replace all?

Any suggestions would be appreciated, thanks

Cole
  • 349
  • 5
  • 16
  • don't know if it's possible with Apache Poi, but docxtemplater provides a command line interface that does exactly that: https://github.com/edi9999/docxtemplater and http://javascript-ninja.fr/docxgenjs/examples/demo.html for a demo – edi9999 Jun 06 '14 at 16:34
  • 1
    there is also YARG template engine based on poi https://github.com/Haulmont/yarg/wiki – Konstantin V. Salikhov Jun 06 '14 at 18:37

2 Answers2

6

XWPFDocument (docx) has different kind of sub-elements like XWPFParagraphs, XWPFTables, XWPFNumbering etc.

Once you create XWPFDocument object via:

document = new XWPFDocument(inputStream);

You can iterate through all of Paragraphs:

document.getParagraphsIterator();

When you iterator through Paragraphs, For each Paragraph you will get multiple XWPFRuns which are multiple text blocks with same styling, some times same styling text blocks will be split into multiple XWPFRuns in which case you should look into this question to avoid splitting of your Runs, doing so will help identify your placeHolders without merging multiple Runs within same Paragraph. At this point you should expect that your placeHolder will not be split in multiple runs if that's the case then you can go ahead and Iterate over 'XWPFRun's for each paragraph and look for text matching your placeHolder, something like this will help:

XWPFParagraph para = (XWPFParagraph) xwpfParagraphElement;
for (XWPFRun run : para.getRuns()) {
    if (run.getText(0) != null) {
        String text = run.getText(0);
        Matcher expressionMatcher = expression.matcher(text);
        if (expressionMatcher.find() && expressionMatcher.groupCount() > 0) {
            System.out.println("Expression Found...");
        }
    }
}

Where expressionMatcher is Matcher based on a RegularExpression for particular PlaceHolder. Try having regex that matches something optional before your PlaceHolder and after as well e.g \([]*)(PlaceHolderGroup)([]*)^, trust me it works best.

Once you find the right XWPFRun extract text of your interest in it and create a replacement text which should be easy enough, then you should replace new text with previous text in this particular run by:

run.setText(text, 0);

If you were to replace this whole XWPFRun with a completely a new XWPFRun or perhaps insert a new Paragraph/Table after the Paragraph owning this run, you would probably run into a few problems, like A. ConcurrentModificationException which means you cannot modify this List(of XWPFRuns) you are iterating and B. finding the position of new Element to insert. To resolve these issues you should have a List<XWPFParagraph> of XWPFParagarphs that can hold paras after which new Element is to be inserted. Once you have your List of replacement you can iterator over it and for each replacement Paragraph you simply get a cursor and insert new element at that cursor:

for (XWPFParagraph para: paras) {
    XmlCursor cursor = (XmlCursor) para.getCTP().newCursor();
    XWPFTable newTable = para.getBody().insertNewTbl(cursor);
    //Generate your XWPF table based on what's inside para with your own logic
}

To create an XWPFTable, read this.

Hope this helps someone.

Community
  • 1
  • 1
user2009750
  • 3,169
  • 5
  • 35
  • 58
  • 3
    Quick note - If you have a File, use it! Don't turn it into a stream to give to POI, that leads to higher memory use. See [this bit of the docs](http://poi.apache.org/spreadsheet/quick-guide.html#FileInputStream) for details – Gagravarr Jul 15 '15 at 08:26
  • 1
    The text representing the placeholder can be fragmented into more XWPFRun so the run.getText (0) statement may not return all the text of the placeholder but only a part. This is the problem I'm trying to overcome – Forcuti Alessandro Dec 03 '19 at 09:07
-1
        // Text nodes begin with w:t in the word document
        final String XPATH_TO_SELECT_TEXT_NODES = "//w:t";
        try {
            // Open the input file
            String fileName="test.docx";
            String[] splited=fileName.split(".");
            File dir=new File("D:\\temp\\test.docx");
            WordprocessingMLPackage wordMLPackage =    WordprocessingMLPackage.load(new FileInputStream(dir));

            // Build a list of "text" elements
            List<?> texts = wordMLPackage.getMainDocumentPart().getJAXBNodesViaXPath(XPATH_TO_SELECT_TEXT_NODES, true);
            HashMap<String, String> mappings = new HashMap<String, String>();
            mappings.put("1", "one");
            mappings.put("2", "two");

            // Loop through all "text" elements
            Text text = null;

            for (Object obj : texts) {

               text = (Text) ((JAXBElement<?>) obj).getValue();
               String textToReplace = text.getValue();
               if (mappings.keySet().contains(textToReplace)) {
                   text.setValue(mappings.get(textToReplace));

               }
           }

       wordMLPackage.save(new java.io.File("D:/temp/forPrint.docx"));//your path



    } catch (Exception e) {

    }


    }

}
  • 4
    Welcome to Stack Overflow! This is a pretty large code block, so it would be helpful if you edited it to include a description of how it solves the question. – josliber Sep 14 '15 at 05:09
  • 3
    This question doesn't answer the question since it is using docx4j instead of apache poi, while the question asks explicitly how to replace placeholder **with apache poi** – Yannick Huber Aug 20 '18 at 14:32