1

I try to use python-docx to read a ms word document, and I don't found a function or method in API.If I wanna read formula in ms word, any advice? update: I try to print all text attribute of document object with follow code,it can't show any formula information at all.

from docx import Document
from docx.shared import Inches
import collections
def object_walk(obj,stack):
    result=set()
    id_hex=hex(id(obj))
    if id_hex in id_set:
        return result
    else:
        id_set.add(id_hex)
    if len(stack)==8 or obj is None:
        return  result
    for attr in (name for name in dir(obj) if not name.startswith('_')):
        if attr=="text":
            print(getattr(obj,attr),"============",stack)
        if isinstance(obj, collections.Iterable):
            i=0
            for item in obj:
                stack.append(attr+str(i))
                object_walk(item,stack)
                stack.pop()
                i+=1
        else:
            stack.append(attr)
            try:
                object_walk(getattr(obj,attr),stack)
            except:
                pass
            stack.pop()

document=Document("demo.docx")
id_set=set()
object_walk(document,["root"])
FavorMylikes
  • 1,182
  • 11
  • 20
  • Did you try anything? [Extract text from document](http://stackoverflow.com/questions/25228106/how-to-extract-text-from-an-existing-docx-file-using-python-docx). After being able to read it, it's probably just a matter of encoding. – Simon Jul 26 '16 at 04:37
  • Thank for you response,I tried check whether there is any content in document.paragraph.If I create a ms-word file only have a formula,just a "\n" in document.paragraph. – FavorMylikes Jul 26 '16 at 04:55
  • @FavorMylikes did you manage to find a solution for this? I'm trying to extract formula from a docx too. – snowflake Dec 18 '18 at 11:23

1 Answers1

1

I was facing the same issue, I found a python package which also reads equations. https://hrus.in/docxlatex/, this is the documentation for the package. There's one limitation, the equations need to be in linear format. It's mentioned at the top of the documentation, it also says how to convert the equation to linear format, so refer to it.

NotKashish
  • 11
  • 1
  • Your answer could be improved with additional supporting information. Please [edit] to add further details, such as citations or documentation, so that others can confirm that your answer is correct. You can find more information on how to write good answers [in the help center](/help/how-to-answer). – Community Aug 01 '22 at 12:25