-1

I'm going through the lxml tutorial and I have a question:

Here is the code:

>>> html = etree.Element("html")
>>> body = etree.SubElement(html, "body")
>>> body.text = "TEXT"

>>> etree.tostring(html)
b'<html><body>TEXT</body></html>'
#############LOOK!!!!!!!############
>>> br = etree.SubElement(body, "br")
>>> etree.tostring(html)
b'<html><body>TEXT<br/></body></html>'
#############END####################

>>> br.tail = "TAIL"
>>> etree.tostring(html)
b'<html><body>TEXT<br/>TAIL</body></html>'

As you can see, in the wrapped block, the instruction br = etree.SubElement(body, "br") will only create a <br /> mark, and why is that?

Is br a reserved word?

VELVETDETH
  • 314
  • 1
  • 8
  • What do you mean by reserved word? There are very few [reserved words](http://stackoverflow.com/q/22864221/190597) in Python, and `br` is not one of them. – unutbu Oct 19 '14 at 10:17
  • I can't tell what you are asking here. How is this behaviour different from what you are expecting? – Daniel Roseman Oct 19 '14 at 10:21
  • 1
    `
    ` is the [shorthand notation](http://www.w3.org/TR/xhtml1/#h-4.6) for the empty element `
    `. Since `SubElement()` doesn't create *tags*, but *elements*, a complete element is what you get.
    – Lukas Graf Oct 19 '14 at 10:37
  • @LukasGraf Thank you so much! I think that's what I mean. You know, I think lxml should treat all it's subelements equally. Based on this idea, I was wondering why the print result shouldn't be "
    " for the result of "body" is "". So the difference is done by the `tostring` function, not the SubElement one?
    – VELVETDETH Oct 19 '14 at 10:51
  • @VELVETDETH I would assume so, yes. In the abstract definition of the tree, it's just another node (element). The shorthand notation for empty elements is just an aspect of representation. – Lukas Graf Oct 19 '14 at 10:59
  • Instead of editing the question to include the answer, please post the answer to the question and accept it so that the question doesn't remain unanswered. – user4815162342 Oct 19 '14 at 12:29

1 Answers1

0

Thanks to someone's kindly notification, I should publish my answer here:

Look at this code first:

from lxml import etree

if __name__ == '__main__':
    print """Trying to create xml file like this:
        <html><body>Hello<br/>World</body></html>"""

    html_node = etree.Element("html")
    body_node = etree.SubElement(html_node, "body")
    body_node.text = "Hello"

    print "Step1:" + etree.tostring(html_node)

    br_node = etree.SubElement(body_node, "br")
    print "Step2:" + etree.tostring(html_node)

    br_node.tail = "World"
    print "Step3:" + etree.tostring(html_node)

    br_node.text = "Yeah?"
    print "Step4:" + etree.tostring(html_node)

Here is the output:

Trying to create xml file like this:
        <html><body>Hello<br/>World</body></html>
Step1:<html><body>Hello</body></html>
Step2:<html><body>Hello<br/></body></html>
Step3:<html><body>Hello<br/>World</body></html>
Step4:<html><body>Hello<br>Yeah?</br>World</body></html>

At first, what I was trying to figure out is:

Why the output of br_node is
rather than

You may check the step3 and step4, and the answer is quite clear:

If the element has no content, it's output format would be <"name"/>

Due to the existing semantic of
, this easy question confused me for a long time.

Hope this post will help some guys like me.

VELVETDETH
  • 314
  • 1
  • 8