41

I need to save HTML documents in memory as Word .DOC files.

Can anybody give me some links to both closed and open source libraries that I can use to do this?

Also, I should edit this question to add the language I'm using in order to narrow down the choices.

trejder
  • 17,148
  • 27
  • 124
  • 216
Mask
  • 33,129
  • 48
  • 101
  • 125
  • 6
    Anything is possible. How much cash you want to spend on this? –  Oct 26 '09 at 12:20
  • 2
    Now its a real question. –  Oct 26 '09 at 12:22
  • 1
    @Mask: Yes, please specify which language you're using. – Alan Nov 10 '11 at 17:47
  • If you want the word to look exactly like the html as rendered in browser its gonna be tough, unless you parse the html and write it to word format using libraries like open office xml etc, else you may need in invest in commercial convertor options. I was looking to find an OSS solution for this and later gave up and converted html to pdf using wkhtmltopdf, if I need to edit this i can do it nitropdf or foxit :) – Deepu Nov 17 '14 at 11:26
  • 1
    @Mask You can try [Convert HTML to Well-Formatted Microsoft Word Document](https://weblogs.asp.net/dixin/convert-html-to-well-formatted-microsoft-word-document) - a detailed procedure written by a Microsoft employee who describes in detail how he converted his own online `LINQ via C# Tutorial` into a well-formed MS Word document. – nam Dec 31 '15 at 17:42

5 Answers5

36

Try using pandoc

pandoc -f html -t docx -o output.docx input.html

If the input or output format is not specified explicitly, pandoc will attempt to guess it from the extensions of the input and output filenames.
— pandoc manual

So you can even use

pandoc -o output.docx input.html
Jan
  • 1,231
  • 2
  • 13
  • 19
  • This works very good. As easy as sudo apt-get install -y pandoc (in Ubuntu) – Alejo Dev Aug 14 '15 at 16:08
  • I agree this is actually a good solution if you're after a bit more than what MS Word offers - it also gives you a general purpose tool to use in other places (e.g. converting from HTML to PDF, etc.). In saying that for a really basic option the suggestion that d4nt suggested works a treat as well :) – Anton Babushkin May 03 '16 at 06:30
  • this doesn't work in my case – Beraliv Dec 13 '16 at 19:54
  • 1
    Could you please be more specific about that, @Beraliv. Do you get any error messages from pandoc? Isn't Word able to open the document? – Jan Dec 16 '16 at 21:52
  • @Jan Alright, sorry for no explanation. I tried to express that transformation isn't ideal: I can't transform formulae in the correct way (I mean all formulae), the style is getting worse than I expected and the text somewhere looks awful (offsets, font, etc.). And yes, I require a lot. – Beraliv Dec 17 '16 at 22:43
5

A good option is to use an API like Docverter. Docverter will allow you to convert HTML to PDF or DOCX using an API.

Armen
  • 4,064
  • 2
  • 23
  • 40
user1980965
  • 51
  • 1
  • 1
5

just past this on head of your php page. before any code on this should be the top code.

<?php
header("Content-Type: application/vnd.ms-word"); 
header("Expires: 0"); 
header("Cache-Control: must-revalidate, post-check=0, pre-check=0"); 
header("content-disposition: attachment;filename=Hawala.doc");

?>

this will convert all html to MSWORD, now you can customize it according to your client requirement.

SAR
  • 1,765
  • 3
  • 18
  • 42
4

When doing this I found it easiest to:

  1. Visit the page in a web browser
  2. Save the page using the web browser with .htm extension (and maybe a folder with support files)
  3. Start Word and open the saved htmfile (Word will open it correctly)
  4. Make any edits if needed
  5. Select Save As and then choose the extension you would like doc, docx, etc.
SnapShot
  • 5,464
  • 5
  • 42
  • 40
1

Other Alternatives from just renaming the file to .doc.....

http://msdn.microsoft.com/en-us/library/microsoft.office.interop.word(office.11).aspx

Here is a good place to start. You can also try using this Office Open XML.

http://www.ecma-international.org/publications/standards/Ecma-376.htm

sleath
  • 871
  • 1
  • 13
  • 42