0

We are trying to build to java code to read the word document (docx file) using apache POI. I have taken hint from this solution Reading equations from Word (*.docx) to HTML together with their text context using apache poi. I have imported the required dependencies.

<dependency>
  <groupId>org.apache.poi</groupId>
  <artifactId>poi-ooxml</artifactId>
  <version>5.0.0</version>
</dependency>

I have imported the exact CTP and CTOMath functions.

import org.openxmlformats.schemas.wordprocessingml.x2006.main.CTP;
import org.openxmlformats.schemas.officeDocument.x2006.math.CTOMath;
import org.openxmlformats.schemas.officeDocument.x2006.math.CTOMathPara;

I have changed the xsl file to MML2OMML.XSL which i got from Microsoft Office folder in Windows. Earlier i used the described xsl file (OMML2MML.XSL) which gave me more errors.

The problem is i am not getting desired mathml output.

My input was word doc like this:

enter image description here

My output was

  • mathml for formula 1:
  • mathml for formula 2:
  • Text: Hi this is Nikhil

Could somebody please help me here? Any suggestions are appreciated. Thanks in advance.

  • 1
    Just tested my code in linked answer using `apache poi 5.0.0` and it works. You need `OMML2MML.XSL` as `MML2OMML.XSL` converts the other way around. And you need `poi-ooxml-full-5.0.0.jar` for `apache poi 5.0.0`. – Axel Richter Jul 14 '21 at 06:20
  • @Axel Richter I included all the same requirements like poi-ooxml-full-5.0.0.jar for apache poi 5.0.0. But i have to use TransformerFactory tFactory = new net.sf.saxon.TransformerFactoryImpl(); as XSL transformer was not supporting the XSLT script version that had been used, so I switched to SAXON. Now i am getting mathMLList:[x+an=k=0nnkxkan-k, a2+b2=c2]. It's latex format not mathml. Could you kindly help? Thanks in advance. – Nikhil Cherian Jul 15 '21 at 05:56
  • No, sorry, then I cannot help. My answer is tested and works using `javax.xml.transform`. This is the default Java transformer API. Why not using this? But your result looks as if the browser simply is not able showing `MathML` properly. Have a look into the source of the `result.html`. There you will find properly coded `MathML`, I bet. – Axel Richter Jul 15 '21 at 06:10
  • I am sorry. It does not work with javax.xml.transform that why i have to switch to saxon. I looked into the source. I found this.

    x+an=k=0nnkxkan-k

    a2+b2=c2

    Hi this is nikhil

    Equations looks like general text.
    – Nikhil Cherian Jul 16 '21 at 14:22

0 Answers0