1

We have a word/docx file which has equations. Using POI's XWPFWordExtractor.getText doesn't read the equations.

My questions are:

  1. What/how are these equations represented as?
  2. How do I read them (I want to eventually display them on an HTML - as MathML??)?

Thanks!

Chinmay
  • 4,726
  • 6
  • 29
  • 36

1 Answers1

1

An equation in a docx file is representation using omml m:oMathPara/m:oMath:

  <m:oMathPara xmlns:m="http://schemas.openxmlformats.org/officeDocument/2006/math">
    <m:oMath>

I don't know about POI, but in docx4j, elements in that namespace are represented using JAXB generated objects in org.docx4j.math

I'd tackle your second question by marshalling the m:oMathPara/m:oMath, then transforming via omml2mathml.xsl See further Murray Sargent's blog (for example here and here).

Community
  • 1
  • 1
JasonPlutext
  • 15,352
  • 4
  • 44
  • 84