3

I'm using python module docxtpl to print data inside a docx template written with jinja2. Works great so far but i need to render in the docx some simple HTML like this:

<h2>Some title</h2>
<h4>Lorem&nbsp;ipsum <strong>dolor sit amet</strong>, consectetur adipisicing elit. Possimus, aliquam,
    minima fugiat placeat provident optio nam reiciendis eius beatae quibusdam!</h4>
<p style="font-size: 18px;">The text is derived from Cicero's De Finibus Bonorum et Malorum (On the Ends of Goods
    and Evils, or alternatively [About] The Purposes of Good and Evil). The original passage began: Neque porro
    quisquam est qui dolorem ipsum quia dolor sit amet, consectetur, adipisci velit (Translation: "Neither is
    there <del>anyone</del> who loves grief itself since it is grief and thus wants to obtain it").</p>
<table class="table">
    <tbody>
        <tr>
            <td>Test</td>
            <td>Test1</td>
            <td>Test2</td>
            <td>Test3</td>
        </tr>
        <tr>
            <td>Lorem</td>
            <td>Lorem1</td>
            <td>Lorem2</td>
            <td>Lorem3</td>
        </tr>
        <tr>
            <td>Ipsum</td>
            <td>Ipsum1</td>
            <td>Ipsum2</td>
            <td>Ipsum3</td>
        </tr>
    </tbody>
</table>

Unfortunately I can't use RichText() from docxtpl to render tables and other html stuff.

I tried to come up with some solutions but I want to understand if there are better ways besides, for example, merge one docx generated from html usign htmldocx with the one generated using docxtpl or get the content with python-docx module from one docx and insert it into the other.

In the worst case scenario, I am also willing to switch to JavaScript/BASH.

p_sutherland
  • 471
  • 1
  • 11
  • 21
Mario
  • 131
  • 1
  • 12
  • I am dealing with this same issue. Did you ever find a solution within `docxtpl`? – p_sutherland Mar 29 '22 at 15:14
  • 1
    I'm sorry to say I haven't found a solution yet... – Mario Mar 30 '22 at 16:04
  • I am using the workaround suggested by @Synthase. Specifically, I use `docxtpl` for the majority of my document, then for the html-specific content I create a standalone html file, then use `htmldocx` to convert the html file to docx, and then append it back to the main docx using `docxcompose`. Not ideal, but it works for my purposes. – p_sutherland Mar 30 '22 at 18:13

2 Answers2

3
from htmldocx import HtmlToDocx

new_parser = HtmlToDocx()
new_parser.parse_html_file("html_filename", "docx_filename")
#Files extensions not needed, but tolerated

This should work like a charm to convert html in docx.

Not sure I understand your merge problem. The best would certainly be to have your docx template in html instead. Then once you dumped in a single html file everything you need, you convert to docx.

In case you want to merge/insert docx together or so, you can have a look here: How do I append new data to existing doc/docx file using Python

Synthase
  • 5,849
  • 2
  • 12
  • 34
  • 1
    Yeah but... not quite what i'm looking for as I said in the question: better solutions than merge 2 docx generated in different ways? I need to use docxtpl or some module that allowd templating – Mario Apr 01 '21 at 09:25
  • That's what I said. You should generate an HTML template, not a docx one. Then you pass the whole html thing in docx at the end. – Synthase Apr 01 '21 at 09:32
1

I managed to accomplish this by using sub-documents in docxtpl and htmldoc. This solution will give you the flexibility of using a template and use subdoc only for the dynamic HTML that needs to be generated.

template = DocxTemplate(template_path)
# Make sub document
desc_document = Document()
new_parser = HtmlToDocx()
new_parser.add_html_to_document("<p>Html here</p>" , desc_document)
desc_result_path = settings.SITE_ROOT+"/my_files/temps/subdoc.docx"
desc_document.save(desc_result_path)
# Render as Subdoc
sub_doc = template.new_subdoc(desc_result_path)
context = {
    'sub_doc': sub_doc,
}
# Generate doc
try:
    template.render(context)
    template.save(local_result_path)
except Exception as e:
    print(e)
# UNLINK temp files
shutil.rmtree(settings.SITE_ROOT+"/my_files/temps/")

I finally cleared all the subdocuments generated using shutil

Dency G B
  • 8,096
  • 9
  • 47
  • 78