0

I have a file .docx with the following content:

The answer is:

[![equation][1]][1]

It has 2 lines: the first line is the text The answer is and second is an equation [![equation][1]][1] which is an image. In python, I read the file as follows:

with open('text.docx', 'rb') as f:
    c = f.read()

Is there a way to look for the character '\n' in the bytes c? So that i can break c into two part: text part and image part?

Dr Linh Chi Nguyen
  • 1,063
  • 1
  • 9
  • 17
  • Please read the file as text, not as raw bytes. A character is not a byte, and is not contained within the bytes of the file. Bytes are raw data; text is *one possible interpretation* of that data. "So that i can break c into two part: text part and image part?" The file **does not contain** an image. "an equation `[![equation][1]][1]` which is an image." No, it isn't. It is **text**. – Karl Knechtel Sep 01 '22 at 04:14
  • I uploaded the image but it doesn't show. If I read as text the image is lost. – Dr Linh Chi Nguyen Sep 01 '22 at 04:57
  • You cannot just "read" any kind of file you want and expect to pull out whatever "images", "text" etc. you expect the file to contain. Files **only** contain raw data. Everything after that requires *interpreting* the file contents, according to their *format*. In order to process a `.docx` file, you will realistically need a third-party library. We cannot recommend one here. [Please try](https://meta.stackoverflow.com/questions/261592) to [look for one yourself](https://duckduckgo.com/?q=read+docx+in+python). – Karl Knechtel Sep 01 '22 at 10:07
  • i tried to use this docx library, after imported, the equation images and the subscript are also lost – Dr Linh Chi Nguyen Sep 01 '22 at 11:42

0 Answers0