1

I am making a script to refactor template-enhanced html.

I'd like beautifulsoup to print verbatim any code inside {{ }} without transforming > or other symbols into html entities.

It needs to split a file with many templates into many files, each with one template.

Spec: splitTemplates templates.html templateDir must:

  1. read templates.html
  2. for each <template name="xxx">contents {{ > directive }}</template>, write this template to file templateDir/xxx

Code snippet:

soup = bs4.BeautifulSoup(open(filename,"r"))
for t in soup.children:
    if t.name=="template":
        newfname = dirname+"/"+t["name"]+ext
        f = open(newfname,"w")
        # note f.write(t) fails as t is not a string -- prettify works
        f.write(t.prettify())
        f.close()

Undesired behavior:

{{ > directive }} becomes {{ &gt; directive }} but needs to be preserved as {{ > directive }}

Update: f.write(t.prettify(formatter=None)) sometimes preserves the > and sometimes changes it to &gt;. This seems to be the most obvious change, not sure why it changes some > but not others.

Solved: Thanks to Hai Vu and also https://stackoverflow.com/a/663128/103081

    import HTMLParser
    U = HTMLParser.HTMLParser().unescape
    soup = bs4.BeautifulSoup(open(filename,"r"))
    for t in soup.children:
        if t.name=="template":
            newfname = dirname+"/"+t["name"]+ext
            f = open(newfname,"w")
            f.write(U(t.prettify(formatter=None)))
            f.close()

See Also: https://gist.github.com/DrPaulBrewer/9104465

Community
  • 1
  • 1
Paul
  • 26,170
  • 12
  • 85
  • 119

1 Answers1

1

You need to unescape the HTML contents. To do that, use the HTMLParser module. This is not my original idea. I took it from:

How can I change '>' to '&gt;' and '&gt;' to '>'?

Community
  • 1
  • 1
Hai Vu
  • 37,849
  • 11
  • 66
  • 93