3

I want to create a word document from an HTML page. I am planning to get the values on the HTML page and then pass these values to a document template. I have used JSOUP to parse the contents of the HTML page and I get the values in my java program. I now want to pass these values to a word document template. I want to know what are the best techniques I can use to create the document template and pass the values to the template to create the word document.

Thank You.

Sunmit Girme
  • 559
  • 4
  • 13
  • 30

3 Answers3

5

I found something very Interesting and simple. We just need to create a simple .xml template for the document we want to create and then programmatically change the contents of the xml file and save it as a ms word document.

You can find the xml template and the code here.

Sunmit Girme
  • 559
  • 4
  • 13
  • 30
  • That's DOCX format only though. Good if you are using recent versions of Word. – Paul Jowett Mar 15 '12 at 09:55
  • I was able to create a .doc file. I first created a temperory xml containing the dynamic values. Then i converted this temp xml into .doc using the response.setContentType() and response.setHeader() functions of the HTTPServletResponse. I wanted the document to be downloaded once it was created. – Sunmit Girme Oct 18 '12 at 05:27
2

i suggest you use xslt, because your data is already in xml-format and there are well defined xml-formats from microsoft.

You could write a document template with word and save it in xml-format. Then you can convert the word-xml to a xsl-template with your html-xml as input. After the xslt-transformation you have a valid word-xml with your dynamic values from the html-xml.

XSLT example for excel

<?xml version="1.0" encoding="UTF-8" ?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="2.0">
<xsl:output method="xml" encoding="UTF-8" omit-xml-declaration="no" />
<xsl:template match="/">
    <Workbook xmlns="urn:schemas-microsoft-com:office:spreadsheet" xmlns:o="urn:schemas-microsoft-com:office:office"
        xmlns:x="urn:schemas-microsoft-com:office:excel" xmlns:ss="urn:schemas-microsoft-com:office:spreadsheet"
        xmlns:html="http://www.w3.org/TR/REC-html40">
        ...
        <xsl:for-each
            select="/yourroot/person">
        ...
        <Cell ss:StyleID="uf">
                            <Data ss:Type="String">
                                <xsl:value-of
                                    select="@Name" />
                            </Data>
                        </Cell>
        ..
        </xsl:for-each>

...
</xsl:template>
</xsl:stylesheet>
Andreas
  • 1,183
  • 1
  • 11
  • 24
  • thank you. I came across something named [java2word](http://code.google.com/p/java2word/). Do you think it might help. PS: I dont really know how xslt works. Iv just started studying on it. – Sunmit Girme Mar 14 '12 at 08:12
0

JODReports and Docmosis might also be useful options for you since there is template populate and Doc output. If DOCX is your real target, then you can write out the document yourself since the XML is published - but that is a lot of work.

Paul Jowett
  • 6,513
  • 2
  • 24
  • 19