11

I see that there is a converter called WordToHtmlConverter but the process method is not exposed. How should I pass a doc file and get HTML file (or OutputStream)?

Andrew Thompson
  • 168,117
  • 40
  • 217
  • 433
Ron
  • 393
  • 1
  • 4
  • 13
  • is this what your asking? http://stackoverflow.com/questions/227236/convert-word-doc-to-html-programmatically-in-java – enrique2334 Oct 23 '11 at 19:54
  • 1
    It's not...In Apache POI they have a new classes in package org.apache.poi.hwpf.converter to handle that...but couldn't find any tutorial on how to use them. – Ron Oct 24 '11 at 12:52

1 Answers1

20

This code is now working for me!

    HWPFDocumentCore wordDocument = WordToHtmlUtils.loadDoc(new FileInputStream("D:\\temp\\seo\\1.doc"));

    WordToHtmlConverter wordToHtmlConverter = new WordToHtmlConverter(
            DocumentBuilderFactory.newInstance().newDocumentBuilder()
                    .newDocument());
    wordToHtmlConverter.processDocument(wordDocument);
    Document htmlDocument = wordToHtmlConverter.getDocument();
    ByteArrayOutputStream out = new ByteArrayOutputStream();
    DOMSource domSource = new DOMSource(htmlDocument);
    StreamResult streamResult = new StreamResult(out);

    TransformerFactory tf = TransformerFactory.newInstance();
    Transformer serializer = tf.newTransformer();
    serializer.setOutputProperty(OutputKeys.ENCODING, "UTF-8");
    serializer.setOutputProperty(OutputKeys.INDENT, "yes");
    serializer.setOutputProperty(OutputKeys.METHOD, "html");
    serializer.transform(domSource, streamResult);
    out.close();

    String result = new String(out.toByteArray());
    System.out.println(result);
Ron
  • 393
  • 1
  • 4
  • 13