1

I want to convert DOCX file that contains EMF pictures inside to PDF file. Apache POI detects EMF picture type, however it uses com.lowagie.text.Image class while converting to PDF. Unfortunately it doesn't support EMF format. Do you have any idea how I can replace EMF pictures to JPG/GIF/BMP formats that are fully supported?
org.apache.poi.xwpf.converter.pdf version: 1.0.5

FileInputStream fis = new FileInputStream("file.docx");
XWPFDocument document = new XWPFDocument(OPCPackage.open(fis));     
File outFile = new File("file.pdf");
OutputStream out = new FileOutputStream(outFile);
PdfOptions options = PdfOptions.create().fontEncoding("windows-1250");
PdfConverter.getInstance().convert(document, out, options);

The code above give an error:

Dec 21, 2015 10:26:56 AM org.apache.poi.xwpf.converter.pdf.internal.PdfMapper visitPicture SEVERE: The byte array is not a recognized imageformat.

iblis
  • 181
  • 2
  • 9
  • You description seems not accurate but you might find this link helpful -> http://pdfdownload19.blogspot.in/2015/06/how-to-add-clipart-images-to-pdf-in.html – Avis Dec 21 '15 at 11:43
  • I've added WMF support in the latest POI (3.14), but this is currently only used for slideshows. For EMF you can try to use FreeHep. – kiwiwings Mar 10 '16 at 10:20

1 Answers1

0

Sadly, POI does not come with dedicated support for handling WMF/EMF. However, since the Windows GDI provides native support for rendering these formats, Word uses them as "preview images" (esp. for embedded OLEs) all the time.

For the case of WMF you may be able to succeed using Batik. See here. For EMF there is currently no (free) Java implementation AFAIK. All you could do is to

  1. implement it yourself using this spec,
  2. write something (platform-dependant) on top of the GDI or
  3. (simple solution) feed the extracted EMFs back into Word (or any other member of the Office family such as PowerPoint/Visio) and batch process them into PNG using VBA.
Community
  • 1
  • 1
morido
  • 1,027
  • 7
  • 24