I see that there is a converter called WordToHtmlConverter
but the process method is not exposed. How should I pass a doc file and get HTML file (or OutputStream
)?
Asked
Active
Viewed 1.9k times
11

Andrew Thompson
- 168,117
- 40
- 217
- 433

Ron
- 393
- 1
- 4
- 13
-
is this what your asking? http://stackoverflow.com/questions/227236/convert-word-doc-to-html-programmatically-in-java – enrique2334 Oct 23 '11 at 19:54
-
1It's not...In Apache POI they have a new classes in package org.apache.poi.hwpf.converter to handle that...but couldn't find any tutorial on how to use them. – Ron Oct 24 '11 at 12:52
1 Answers
20
This code is now working for me!
HWPFDocumentCore wordDocument = WordToHtmlUtils.loadDoc(new FileInputStream("D:\\temp\\seo\\1.doc"));
WordToHtmlConverter wordToHtmlConverter = new WordToHtmlConverter(
DocumentBuilderFactory.newInstance().newDocumentBuilder()
.newDocument());
wordToHtmlConverter.processDocument(wordDocument);
Document htmlDocument = wordToHtmlConverter.getDocument();
ByteArrayOutputStream out = new ByteArrayOutputStream();
DOMSource domSource = new DOMSource(htmlDocument);
StreamResult streamResult = new StreamResult(out);
TransformerFactory tf = TransformerFactory.newInstance();
Transformer serializer = tf.newTransformer();
serializer.setOutputProperty(OutputKeys.ENCODING, "UTF-8");
serializer.setOutputProperty(OutputKeys.INDENT, "yes");
serializer.setOutputProperty(OutputKeys.METHOD, "html");
serializer.transform(domSource, streamResult);
out.close();
String result = new String(out.toByteArray());
System.out.println(result);

Ron
- 393
- 1
- 4
- 13