3

I am able to generate pdf from docx file using docx4j.But i need to convert doc file to pdf including images and tables. Is there any way to convert doc to docx in java. or (doc to pdf)?

nsgulliver
  • 12,655
  • 23
  • 43
  • 64
user2211381
  • 31
  • 1
  • 2
  • 3
  • 1
    You can run OpenOffice from the terminal (http://dag.wieers.com/home-made/unoconv/) to use it to convert files. It might not the best solution, but it is a rather easy one. – mqchen Mar 26 '13 at 12:13
  • I need also the solution please, did you find it? if yes please share the code – Second View Dec 20 '20 at 10:50

4 Answers4

3

docx4j contains org.docx4j.convert.in.Doc, which uses POI to read the .doc, but it is a proof of concept, not production ready code. Last I checked, there were limits to POI's HWPF parsing of a binary .doc.

Further to mqchen's comment, you can use LibreOffice or OpenOffice to convert doc to docx. But if you are going to use LibreOffice or OpenOffice, you may as well use it to convert both .doc and .docx directly to PDF. Google 'jodconverter'.

JasonPlutext
  • 15,352
  • 4
  • 44
  • 84
2

Cribbing off the POI unit tests, I came up with this to extract the text from a word document:

public String getText(String document) {
    try {
        ZipInputStream is = new ZipInputStream(new FileInputStream(document));
        try {
            is.getNextEntry();
            ByteArrayOutputStream baos = new ByteArrayOutputStream();
            try {
                IOUtils.copy(is, baos);
            } finally {
                baos.close();
            }

            byte[] byteArray = baos.toByteArray();
            ByteArrayInputStream bais = new ByteArrayInputStream(byteArray);
            HWPFDocument doc = new HWPFDocument(bais);
            extractor = new WordExtractor(doc);
            extractor.getText();
        } finally {
            is.close();
        }
    } catch (IOException e) {
        throw new RuntimeException(e);
    }
}

I do hope that points you in the right direction, if not sorts you entirely.

Habebit
  • 957
  • 6
  • 23
hd1
  • 33,938
  • 5
  • 80
  • 91
0

You can use jWordConvert for this.

jWordConvert is a Java library that can read and render Word documents natively to convert to PDF, to convert to images, or to print the documents automatically.

Details can be found at following link http://www.qoppa.com/wordconvert/

Jabir
  • 2,776
  • 1
  • 22
  • 31
0

https://github.com/guptachunky/Conversion-Work This Github Link might be helpful for that.

https://github.com/guptachunky/Conversion-Work/blob/main/src/main/java/com/conversion/Conversion/Service/ConversionService.java

public void docToPdf(FileDetail fileDetail, HttpServletResponse response) {
    InputStream doc;
    try {
        File docFile = converterToFile(fileDetail.getFile());
        doc = new FileInputStream(docFile);
        XWPFDocument document = new XWPFDocument(doc);
        PdfOptions options = PdfOptions.create();
        File file = File.createTempFile("output", ".pdf");
        OutputStream out = new FileOutputStream(file);
        PdfConverter.getInstance().convert(document, out, options);
        getClaimFiles(file, response);
    } catch (IOException e) {
        response.setStatus(AppConstant.SOMETHING_WENT_WRONG);
    }
}

public void getClaimFiles(File file, HttpServletResponse response) {
    try {
        response.setContentType("application/pdf");
        response.setHeader("Content-Disposition",
                "attachment; filename=dummy.pdf");
        response.getOutputStream().write(Files.readAllBytes(file.toPath()));
    } catch (Exception e) {
        response.setStatus(AppConstant.SOMETHING_WENT_WRONG);
    }
}