2

I'm looking into using a rich text editor in my Django project. TinyMCE looks like the obvious solution, however i see that the output format is html (here). Goal is to store user input and then serve it inside a word document using python-docx( which is not html).

Do you know of any solution for this? Either a feature of tinyMCE or a html to word-format converter which keeps styles, or maybe another rich text editor similar to tinymce?

UPDATE:

This is another option which i found to be working fine. Still at the point of trying to convert HTML to Word without losing styles. A solution for this may be pywin32 as stated here but it doesn't help me that much + it's Windows only.

Update2

After quite some digging i found pandoc and pypandoc which appear to be able to translate in any of these output formats: "asciidoc, beamer, commonmark, context, docbook, docbook4, docbook5, docx, dokuwiki, dzslides, epub, epub2, epub3, fb2, gfm, haddock, html, html4, html5, icml, jats, json, latex, man, markdown, markdown_github, markdown_mmd, markdown_phpextra, markdown_strict, mediawiki, ms, muse, native, odt, opendocument, opml, org, plain, pptx, revealjs, rst, rtf, s5, slideous, slidy, tei, texinfo, textile, zimwiki"

I haven't figured out how to integrate such an input to python-docx.

Mike Vlad
  • 351
  • 1
  • 4
  • 12

1 Answers1

2

I had the same challenge. You'll want to use Python's Beautiful Soup library to iterate through the content in your HTML editor (I use Summernote, but any HTML editor should work) then parse HTML tags into a usable format for python-docx. Pandoc and Pypandoc will convert files for you (e.g. you start with a LateX file and need to convert it to Word), but will not provide the tools to need to convert to and from xml/html.

Good luck!

  • Thank you, summernote looks very good, i guess i'll have to write the parser by mysef then- i knew about BS but i hoped that was a ready solution for this. – Mike Vlad Jun 05 '18 at 05:33
  • I was surprised there wasn't an out of the box solution as well. However, once you get the iterator working and get the get familiar with the dictionary of items, you'll find it working well. If you want to get a perfect formatting match (e.g. tabs, indents, etc) it can become tedious but you'll probably find you won't need most of it. – Scott Stanley Jun 08 '18 at 13:15