7

I'am looking for some "stable" method to convert DOCX file from MS WORD into PDF. Since now I have used OpenOffice installed as listener but it often hangs. The problem is that we have situations when many users want to convert SXW,DOCX files into PDF at the same time. Is there some other possibility? I tryed with examples from this site: https://angelozerr.wordpress.com/2012/12/06/how-to-convert-docxodt-to-pdfhtml-with-java/ but the output result is not good (converted documents have errors and layout is quite modified).

here is "source" docx document: enter image description here

here is document converted with docx4j with some exception text inside document. Also the text in upper right corner is missing.

enter image description here

this one is PDF created with OpenOffice as converter from docx to pdf. Some text is missing "upper right corner"

enter image description here

Is there some other option to convert docx into pdf with Java?

Ferguson
  • 527
  • 1
  • 11
  • 29
  • Not on SO; when you would be asking "to recommend a tool or library" - but why not just try to get you openoffice setup stable? – Stefan Hegny Dec 13 '16 at 10:44
  • You can use JODConverter (https://code.google.com/archive/p/jodconverter/) or docx4j (http://www.docx4java.org/trac/docx4j) – Davide Dec 13 '16 at 11:35
  • JODConverter uses OpenOffice in background.. The problem is that OpenOffice sometimes hangs (crash) without any reason. I also tryed docx4j (look at my question) – Ferguson Dec 13 '16 at 11:39
  • That's a 4 year old article you reference there. These days, the recommended way to do it from docx4j is with Plutext's commercial PDF Converter. You can try that online at http://converter-eval.plutext.com/ – JasonPlutext Dec 13 '16 at 12:57

1 Answers1

4

There are lot of methods to do conversion One of the used method is using POI and DOCX4j

InputStream is = new FileInputStream(new File("your Docx PAth"));
            WordprocessingMLPackage wordMLPackage = WordprocessingMLPackage
                    .load(is);
            List sections = wordMLPackage.getDocumentModel().getSections();
            for (int i = 0; i < sections.size(); i++) {
                wordMLPackage.getDocumentModel().getSections().get(i)
                        .getPageDimensions();
            }
            Mapper fontMapper = new IdentityPlusMapper();
            PhysicalFont font = PhysicalFonts.getPhysicalFonts().get(
                    "Comic Sans MS");//set your desired font 
            fontMapper.getFontMappings().put("Algerian", font);
            wordMLPackage.setFontMapper(fontMapper);
            PdfSettings pdfSettings = new PdfSettings();
            org.docx4j.convert.out.pdf.PdfConversion conversion = new org.docx4j.convert.out.pdf.viaXSLFO.Conversion(
                    wordMLPackage);
            //To turn off logger
            List<Logger> loggers = Collections.<Logger> list(LogManager
                    .getCurrentLoggers());
            loggers.add(LogManager.getRootLogger());
            for (Logger logger : loggers) {
                logger.setLevel(Level.OFF);
            }
            OutputStream out = new FileOutputStream(new File("Your OutPut PDF path"));
            conversion.output(out, pdfSettings);
            System.out.println("DONE!!"); 

This works perfect and even tried on multiple DOCX files.

KishanCS
  • 1,357
  • 1
  • 19
  • 38
  • 1
    Tryed with your method but stil get some exception: WARN org.apache.fop.image.loader.batik.PreloaderSVG .preloadImage line 76 - Batik not in class path java.lang.NoClassDefFoundError: org/apache/batik/bridge/UserAgent at org.apache.fop.image.loader.batik.PreloaderSVG.preloadImage(PreloaderSVG.java:69) – Ferguson Dec 13 '16 at 11:36
  • import org.apache.log4j.Level; import org.apache.log4j.LogManager; import org.apache.log4j.Logger; import org.docx4j.convert.out.pdf.viaXSLFO.PdfSettings; import org.docx4j.fonts.IdentityPlusMapper; import org.docx4j.fonts.Mapper; import org.docx4j.fonts.PhysicalFont; import org.docx4j.fonts.PhysicalFonts; import org.docx4j.openpackaging.packages.WordprocessingMLPackage; – KishanCS Dec 13 '16 at 11:41
  • still get the same malformed PDF as in docx4j... here is: http://s5.postimg.org/ptxrxtfyf/screenshot_1540.jpg – Ferguson Dec 13 '16 at 11:44
  • 1
    //To turn off logger List loggers = Collections. list(LogManager .getCurrentLoggers()); loggers.add(LogManager.getRootLogger()); for (Logger logger : loggers) { logger.setLevel(Level.OFF); } This turns off those messages – KishanCS Dec 13 '16 at 11:45
  • Will try to remove log but text (upper right corner), footer etc is missing in PDF document... – Ferguson Dec 13 '16 at 11:53
  • Is it an originally created docx or converted . Please check – KishanCS Dec 13 '16 at 11:55
  • 1
    If possible provide the docx file . – KishanCS Dec 13 '16 at 11:58
  • It's a document created in MS WORD - Office professional 2013.. http://s5.postimg.org/63a55ovlz/screenshot_1541.jpg If you can try here is my document: https://drive.google.com/file/d/0B6Z9wNTXyUEeOUtFRVhZeWtnZ3M/view?usp=sharing – Ferguson Dec 13 '16 at 12:02
  • Check all dependencies once and rebuild the project . IT works charm!! Thank you – KishanCS Dec 13 '16 at 12:12
  • Can you please send me a link with all included libraries? I have download librarires from this site: https://angelozerr.wordpress.com/2012/12/06/how-to-convert-docxodt-to-pdfhtml-with-java/ – Ferguson Dec 13 '16 at 12:33
  • Also if I download latest library from docx4java I can't find Class org.docx4j.convert.out.pdf.PdfConversion – Ferguson Dec 13 '16 at 12:42
  • 4
    The code sample in this answer uses docx4j, not POI :-) – JasonPlutext Dec 13 '16 at 12:59
  • In the most recent docx4j, the export via XSL FO is a separate library, so you'd need that jar and its dependencies. Or use our commercial PDF Converter I recommended in my other comment :-) – JasonPlutext Dec 13 '16 at 13:00
  • HI JasonPlutext.. Have tryed your online converter but in generated PDF there is no image in the lower left corner... http://s5.postimg.org/k5w2ko0zr/screenshot_1542.jpg ant this is original document: http://s5.postimg.org/8utewau4n/screenshot_1543.jpg any idea? – Ferguson Dec 13 '16 at 13:18
  • Would need to see the source docx. Can you email it to me, or drag it to http://ndoc.it and paste the resulting link here? – JasonPlutext Dec 15 '16 at 05:17