19

Is it possible to read and write Word (2003 and 2007) files in Python without using a COM object?
I know that I can:

f = open('c:\file.doc', "w")
f.write(text)
f.close()

but Word will read it as an HTML file not a native .doc file.

Eli Courtwright
  • 186,300
  • 67
  • 213
  • 256
UnkwnTech
  • 88,102
  • 65
  • 184
  • 229

4 Answers4

37

See python-docx, its official documentation is available here.

This has worked very well for me.

Note that this does not work for .doc files, only .docx files.

ChaddRobertson
  • 605
  • 3
  • 11
  • 30
Damian
  • 2,588
  • 23
  • 19
  • 3
    But it is supported .doc format, i tried but it throws me a ValueError `ValueError: file '' is not a Word file, content type is 'application/vnd.openxmlformats-officedocument.themeManager+xml` – Shashank Feb 26 '16 at 13:04
  • It is called python-docx not python-doc, so no. :) – Damian Feb 26 '16 at 17:52
  • 8
    @Damian but the question is also about .doc files, so you should note that your answer only applies to .docx files. – oulenz Aug 31 '18 at 07:21
  • 1
    Any idea for opening .doc files then? – linello May 22 '20 at 07:39
11

If you only what to read, it is simplest to use the linux soffice command to convert it to text, and then load the text into python:

Community
  • 1
  • 1
markling
  • 1,232
  • 1
  • 15
  • 28
7

I'd look into IronPython which intrinsically has access to windows/office APIs because it runs on .NET runtime.

Florian Bender
  • 161
  • 2
  • 4
auramo
  • 13,167
  • 13
  • 66
  • 88
3

doc (Word 2003 in this case) and docx (Word 2007) are different formats, where the latter is usually just an archive of xml and image files. I would imagine that it is very possible to write to docx files by manipulating the contents of those xml files. However I don't see how you could read and write to a doc file without some type of COM component interface.

fuentesjr
  • 50,920
  • 27
  • 77
  • 81