1

I am using Python to programmatically generate HTML. The HTML I want to generate is this:

<p>Hello <b>world</b> how are you?</p>

However, I do not know how to add the hello before the <b> tag and the string how are you? after the bold tag.

My code looks like this:

from xml.etree import ElementTree

p = ElementTree.Element('p')

b = ElementTree.Element('b')
b.text = 'world'

p.append(b)

Where would I add hello and how are you? The paragraph element only has one p.text field, and there does not seem to be a way to intersperse text and other HTML tags when building the document.

How can I programmatically generate an HTML document with both tags and text mixed together?

Zach Young
  • 10,137
  • 4
  • 32
  • 53
poundifdef
  • 18,726
  • 23
  • 95
  • 134
  • XML ≠ HTML so why are you using `xml` module? – martineau Dec 19 '21 at 00:17
  • `ElementTree` is not the right tool for generating HTML. HTML is more free-form than XML. There are many template processors available that would be a better choice (jinja2, Cheetah, Django). Is there some reason you don't just want to generate strings here? – Tim Roberts Dec 19 '21 at 00:19
  • I am using the `xml` module because it was mentioned in this answer, however, I'm open to using a different library. https://stackoverflow.com/questions/6748559/generating-html-documents-in-python – poundifdef Dec 19 '21 at 00:20
  • For my application, I am parsing an existing tree data structure and converting it to an HTML representation. It seemed like the most straightforward thing would be to generate an HTML tree structure using an existing library, and then use that to render the final HTML. I could write my own, if that is required, but it would be nice if I could use an existing solution. – poundifdef Dec 19 '21 at 00:22

2 Answers2

2

Regardless of how lenient/permissive the parsing of HTML by the rendering engine is, OP is asking how to responsibly build structured text.

Here's how to do build structure with ElementTree's TreeBuilder class, it's very straight-forward:

#!/usr/bin/env python3
#!/usr/bin/env python3
import xml.etree.ElementTree as ET

builder = ET.TreeBuilder()
builder.start('p', {})
builder.data('Hello ')
builder.start('b', {})
builder.data('world')
builder.end('b')
builder.data(' how are you?')
builder.end('p')

root = builder.close()  # close to "finalize the tree" and return an Element

ET.dump(root)  # print the Element

For what it’s worth, I see

<p>Hello <b>world…

as being very analogous to

<para>Hello <emphasis>world…

in Docbook XML.

Zach Young
  • 10,137
  • 4
  • 32
  • 53
  • 1
    `TreeBuilder` was exactly what I was looking for. How would you add "attributes" to this? (`

    `)? In my original example, you might use `ElementTree.Element('p', attrib={"style": "font-weight: bold"})`

    – poundifdef Dec 19 '21 at 01:57
  • I added a link to the docs, the `start()` method also takes optional attributes. – Zach Young Dec 19 '21 at 02:00
  • The code in the answer does not quite work. You cannot omit the second argument to `start()` ("TypeError: start expected 2 arguments, got 1"). – mzjn Dec 19 '21 at 17:22
  • @mzjn I’m running 3.8.9 and that code works for me. If you look at OP’s response, it seems to work for them, too. What version of Python are you running? – Zach Young Dec 19 '21 at 18:23
  • I run Python 3.10.1. – mzjn Dec 19 '21 at 18:25
  • I just tried 3.8.2 at https://replit.com/languages/python3 and there was no error. This changeset seems related: https://github.com/python/cpython/commit/4edc95cf0a2960431621eee9bc194f6225f1690b (added in 3.9 as far as I can tell). – mzjn Dec 19 '21 at 18:39
  • 1
    @mzjn, great catch! I just built 3.10 and yeah, I see that error now. I've updated my answer's code. – Zach Young Dec 19 '21 at 20:47
-1

You CAN do this, but you'd need to put the pieces of text into <span> tags. In my opinion, this is just a bad idea. HTML is not XML. There are much better tools.

import sys
from xml.etree import ElementTree as ET

html = ET.Element('html')
body = ET.Element('body')
html.append(body)
para = ET.Element('p')
b1 = ET.Element('span')
b1.text = "Hello"
b2 = ET.Element('b')
b2.text = "world,"
b3 = ET.Element('span')
b3.text = "how are you?"
para.append(b1)
para.append(b2)
para.append(b3)
html.append(para)

ET.ElementTree(html).write(sys.stdout, encoding='unicode', method='html')
Tim Roberts
  • 48,973
  • 4
  • 21
  • 30