2

I am trying to use "Apache POI" to extract embedded equation and text from a .doc MS Word file into a .ppt MS Powerpoint file, I have successfully extracted text, but how do I extract embedded equations?

the Embedded Equations comes out like this if I only extract it as text:

!!EMBED Equation.3
Danilo Piazzalunga
  • 7,590
  • 5
  • 49
  • 75
CarlLee
  • 3,952
  • 5
  • 23
  • 33

1 Answers1

3

This may not help you with the binary .doc format, but for the newer .docx format, I was able to get to the equation, which is embedded as an OLE document, using the following code:

 InputStream in = new FileInputStream(f);
 XWPFDocument doc = new XWPFDocument(in);
 for (PackagePart p : doc.getAllEmbedds()) {
   POIFSFileSystem poifs = new POIFSFileSystem(p.getInputStream());
   byte[] oleData = IOUtils.toByteArray(
              poifs.createDocumentInputStream("Equation Native"));
 }

And then you can extract the MathType data in there and hand it to a MTEF parser.

If you don't need the MathType data, there is also a placeholder image (in WMF format) that just renders the equation.

Community
  • 1
  • 1
Thilo
  • 257,207
  • 101
  • 511
  • 656
  • Thank you, though I don't need it anymore. – CarlLee Aug 10 '12 at 09:57
  • @Thilo Could you please take a look at this question also? http://stackoverflow.com/questions/35418453/how-can-i-add-embedded-equations-to-docx-files-by-using-apache-poi – MJBZA Feb 15 '16 at 20:31