0

I'm using the jinja2 templating engine to create both HTML emails and their plaintext alternative that I then send out using Sendgrid. Unfortunately for my lazy self, this entails me writing and maintaining two separate templates with essentially the same content, the .html file and the .txt file. The .txt file is identical to the HTML file other than containing no HTML tags.

Is there any way to simply have the HTML template and then somehow dynamically generate the txt version, essentially just stripping the HTML tags? I know a regex could achieve this, but I also know that implementing a regex to deal with HTML tags is notoriously gotcha-ridden.

user1427661
  • 11,158
  • 28
  • 90
  • 132

1 Answers1

0

I used this trick to get text out of HTML even if HTML is broken:

    text = get_some_html()

    import StringIO, htmllib, formatter
    io = StringIO.StringIO()
    htmllib.HTMLParser(formatter.AbstractFormatter(formatter.DumbWriter(io))).feed("<pre>"+text+"</pre>")
    text = io.getvalue()

If you are sure your HTML is well-formed, you don't need those <pre> tags.

Dima Tisnek
  • 11,241
  • 4
  • 68
  • 120