13

I am using pisa, which is an HTML to PDF conversion library for Python.

Does there exist the same thing for a Word document: an HTML to .doc conversion library for Python?

Pang
  • 9,564
  • 146
  • 81
  • 122
Eric
  • 5,101
  • 10
  • 37
  • 45
  • Why would you want this? MS Word can read HTML. – MSalters Nov 19 '10 at 15:08
  • I have the same problem: I have a html that uses pisa to convert to pdf and I want to do the same thing with word. its a big document, ~20 pages, using the same piece of code to generate the html and then export thru pisa or something else would be great. – Rafael Barros Jun 12 '12 at 17:24
  • @Eric: Recently, I had the same problem. Just wondering, did you find a solution to convert HTML to Word .docx? Thanks. – TTT Apr 08 '13 at 21:42
  • @tao.hong : Did you manage to solve your problem? I am looking for a suitable open source solution too. Thanks – sudshekhar Sep 04 '15 at 15:10

4 Answers4

12

You could use win32com from the pywin32 python extensions for windows, to let MS Word convert it for you. A simple example:

import win32com.client

word = win32com.client.Dispatch('Word.Application')

doc = word.Documents.Add('example.html')
doc.SaveAs('example.doc', FileFormat=0)
doc.Close()

word.Quit()
Steven
  • 28,002
  • 5
  • 61
  • 51
5

Though I am not aware of a direct module that can allow you to convert this, however:

  1. You can convert HTML to plain text first using the html2text module.
  2. After that, you can use this the python-docx module to convert the text to a doc or a docx file.
user225312
  • 126,773
  • 69
  • 172
  • 181
2

In case anybody else lands here attempting to convert the other way around, the above code works, but you need to modify the FileFormat value.

http://msdn.microsoft.com/en-us/library/ff839952.aspx

Example: Filtered html is 10, instead of 0.

Cooldox
  • 21
  • 1
-1

Update with a python3.x fix this:

from htmldocx import HtmlToDocx

new_parser = HtmlToDocx()
new_parser.parse_html_file("html_filename", "docx_filename")
#Files extensions not needed, but tolerated
Synthase
  • 5,849
  • 2
  • 12
  • 34
  • any thoughts on how you'd use this if you're converting an HTML string rather than having to save an HTML file first? – Mansidak Mar 03 '23 at 17:38