1

I do have few Word templates, and my requirement is to replace some of the words/place holders in the document based on the user input, using Java. I tried lot of libraries including 2-3 versions of docx4j but nothing work well, they all just didn't do anything!

I know this question has been asked before, but I tried all options I know. So, using what java library I can "really" replace/edit these templates? My preference goes to the "easy to use / Few line of codes" type libraries.

I am using Java 8 and my MS Word templates are in MS Word 2007.

Update

This code is written by using the code sample provided by SO member Joop Eggen

public Main() throws URISyntaxException, IOException, ParserConfigurationException, SAXException
    {
        URI docxUri = new URI("C:/Users/Yohan/Desktop/yohan.docx");
        Map<String, String> zipProperties = new HashMap<>();
        zipProperties.put("encoding", "UTF-8");

         FileSystem zipFS = FileSystems.newFileSystem(docxUri, zipProperties);

           Path documentXmlPath = zipFS.getPath("/word/document.xml");

            DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();

            factory.setNamespaceAware(true);
            DocumentBuilder builder = factory.newDocumentBuilder();

            Document doc = builder.parse(Files.newInputStream(documentXmlPath));

            byte[] content = Files.readAllBytes(documentXmlPath);
            String xml = new String(content, StandardCharsets.UTF_8);
            //xml = xml.replace("#DATE#", "2014-09-24");
            xml = xml.replace("#NAME#", StringEscapeUtils.escapeXml("Sniper"));

            content = xml.getBytes(StandardCharsets.UTF_8);
            Files.write(documentXmlPath, content);
    }

However this returns the below error

java.nio.file.ProviderNotFoundException: Provider "C" Not found

at: java.nio.file.FileSystems.newFileSystem(FileSystems.java:341) at java.nio.file.FileSystems.newFileSystem(FileSystems.java:341)

at java.nio.fileFileSystems.newFileSystem(FileSystems.java:276)
halfer
  • 19,824
  • 17
  • 99
  • 186
PeakGen
  • 21,894
  • 86
  • 261
  • 463
  • maybe for consideration (I would go for Apache HWPF): http://stackoverflow.com/questions/203174/whats-a-good-java-api-for-creating-word-documents – CsBalazsHungary Sep 24 '14 at 11:42
  • @CsBalazsHungary: Link created 5 years ago. Java 8 was not there at that time. – PeakGen Sep 24 '14 at 11:43
  • Is MS Word 2007 already .docx? Because that format is perfect, you can use a java zip file system, and change /word/content.xml. The libraries do not guarantee the original format. – Joop Eggen Sep 24 '14 at 11:44
  • @Sniper sadly it indeed can cause problem :( – CsBalazsHungary Sep 24 '14 at 11:45
  • @JoopEggen: Yeah it is Docx. Prefer to see a library, you know, easy. – PeakGen Sep 24 '14 at 11:45
  • @JoopEggen: Is there any sample code? – PeakGen Sep 24 '14 at 11:48
  • The XML is readable too, load the DOM (or just text), replace the place holders and done. But I may be too naive w.r.t. the requirements. Rename .docx into .zip and take a look. I'll provide some sample code after a moment. – Joop Eggen Sep 24 '14 at 11:49
  • I've never done anything with Word, but I once used Apache POI for Excel and it worked pretty well, for both .xls and .xlsx: http://poi.apache.org/ – blagae Sep 24 '14 at 11:55

5 Answers5

4

One may use for docx (a zip with XML and other files) a java zip file system and XML or text processing.

URI docxUri = ,,, // "jar:file:/C:/... .docx"
Map<String, String> zipProperties = new HashMap<>();
zipProperties.put("encoding", "UTF-8");
try (FileSystem zipFS = FileSystems.newFileSystem(docxUri, zipProperties)) {
    Path documentXmlPath = zipFS.getPath("/word/document.xml");

When using XML:

    DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();

    factory.setNamespaceAware(true);
    DocumentBuilder builder = factory.newDocumentBuilder();

    Document doc = builder.parse(Files.newInputStream(documentXmlPath));
    //Element root = doc.getDocumentElement();

You can then use XPath to find the places, and write the XML back again.

It even might be that you do not need XML but could replace place holders:

    byte[] content = Files.readAllBytes(documentXmlPath);
    String xml = new String(content, StandardCharsets.UTF_8);
    xml = xml.replace("#DATE#", "2014-09-24");
    xml = xml.replace("#NAME#", StringEscapeUtils.escapeXml("Sniper")));
    ...
    content = xml.getBytes(StandardCharsets.UTF_8);
    Files.delete(documentXmlPath);
    Files.write(documentXmlPath, content);

For a fast development, rename a copy of the .docx to a name with the .zip file extension, and inspect the files.

File.write should already apply StandardOpenOption.TRUNCATE_EXISTING, but I have added Files.delete as some error occured. See comments.

Joop Eggen
  • 107,315
  • 7
  • 83
  • 138
  • Thanks for the reply. You mean change the extension of my docx into .zip ? – PeakGen Sep 24 '14 at 12:07
  • hmm.. Can this work with word docx including images and tables? – PeakGen Sep 24 '14 at 12:10
  • You seems to be a great person, I will test and get back to you. – PeakGen Sep 24 '14 at 12:28
  • 1
    The images are in /media. And no, not so great, experience comes to everyone, now to you. The smart thing here, is not using java's ZipFile but the java FileSystem from "jar:file:" URIs. Then one can copy an image file into the docx with just `Files.copy` etc. – Joop Eggen Sep 24 '14 at 12:31
  • `The images are in /media` what did you mean? I have images and tables in my template. I don't want to touch any of them via Java, just wanted to know whether your example can read the file, replace the "text" and write back as it was; which means, without destroying the images, tables which were in the document, but of course with the replaced text. – PeakGen Sep 24 '14 at 12:47
  • Yes, the (zip) file xxx.docx remains unchanged; I thought you wanted to exchange images too. The fact that everything remains as-is, is the advantage of this approach. And XML is text, not using file positions. Hence not much can go wrong. – Joop Eggen Sep 24 '14 at 13:02
  • I am having a problem with `Document doc = builder.parse(Files.newInputStream(documentXmlPath));` from where you imported this? – PeakGen Sep 24 '14 at 14:48
  • `org.w3c.Document`, and `Files` is Java 7 SE. Maybe first try not the XML but the text-only version; that code is complete. – Joop Eggen Sep 24 '14 at 14:53
  • Thanks, I updated my question with your answer. However it returned `java.nio.file.ProviderNotFoundException` – PeakGen Sep 24 '14 at 15:16
  • `URI docxUri = new URI("jar:file:/C:/Users/Yohan/Desktop/yohan.docx");` Without the protocol it searched for a "C:" protocol. It is a `File.toURI()` but with "jar:" in front. – Joop Eggen Sep 24 '14 at 15:22
  • Thanks. Now it says `java.nio.file.FileAlreadyExistsException: word/document.xml` – PeakGen Sep 24 '14 at 15:49
  • Added Files.delete; strange that write does not replace the file. – Joop Eggen Sep 24 '14 at 16:23
  • Unfortunately, it did not work. Nothing is replaced :( – PeakGen Sep 24 '14 at 17:01
  • It worked with me (delete was needed(. One must have MSWord closed of course. – Joop Eggen Sep 24 '14 at 19:58
3

Try Apache POI. POI can work with doc and docx, but docx is more documented therefore support of it better.

UPD: You can use XDocReport, which use POI. Also I recomend to use xlsx for templates because it more suitable and more documented

Michael Kazarian
  • 4,376
  • 1
  • 21
  • 25
2

I have spent a few days on this issue, until I found that what makes the difference is the try-with-resources on the FileSystem instance, appearing in Joop Eggen's snippet but not in question snippet:
try (FileSystem zipFS = FileSystems.newFileSystem(docxUri, zipProperties))
Without such try-with-resources block, the FileSystem resource will not be closed (as explained in Java tutorial), and the word document not modified.

Caroh
  • 113
  • 8
0

Stepping back a bit, there are about 4 different approaches for editing words/placeholders:

  • MERGEFIELD or DOCPROPERTY fields (if you are having problems with this in docx4j, then you have probably not set up your input docx correctly)
  • content control databinding
  • variable replacement on the document surface (either at the DOM/SAX level, or using a library)
  • do stuff as XHTML, then import that

Before choosing one, you should decide whether you also need to be able to handle:

  • repeating data (eg adding table rows)
  • conditional content (eg entire paragraphs which will either be present or absent)
  • adding images

If you need these, then MERGEFIELD or DOCPROPERTY fields are probably out (though you can also use IF fields, if you can find a library which supports them). And adding images makes DOM/SAX manipulation as advocated in one of the other answers, messier and error prone.

The other things to consider are:

  • your authors: how technical are they? What does that imply for the authoring UI?
  • the "user input" you mention for variable replacement, is this given, or is obtaining it part of the problem you are solving?
JasonPlutext
  • 15,352
  • 4
  • 44
  • 84
0

Please try this to edit or replace the word in document

public class UpdateDocument {

    public static void main(String[] args) throws IOException {

        UpdateDocument obj = new UpdateDocument();

        obj.updateDocument(
                  "c:\\test\\template.docx",
                  "c:\\test\\output.docx",
                  "Piyush");
    }

    private void updateDocument(String input, String output, String name)
        throws IOException {

        try (XWPFDocument doc = new XWPFDocument(
                Files.newInputStream(Paths.get(input)))
        ) {

            List<XWPFParagraph> xwpfParagraphList = doc.getParagraphs();
            //Iterate over paragraph list and check for the replaceable text in each paragraph
            for (XWPFParagraph xwpfParagraph : xwpfParagraphList) {
                for (XWPFRun xwpfRun : xwpfParagraph.getRuns()) {
                    String docText = xwpfRun.getText(0);
                    //replacement and setting position
                    docText = docText.replace("${name}", name);
                    xwpfRun.setText(docText, 0);
                }
            }

            // save the docs
            try (FileOutputStream out = new FileOutputStream(output)) {
                doc.write(out);
            }

        }

    }

}