Remove parent tag (without removing children) with ElementTree

Question

I'm using ElementTree to modify the following xml:

<li>
  <p>Some stuff goes in <b>bold</b> here </p>
</li>

I would like to remove all <p> from my <li> elements but keep the contents.

Like this:

<li>Some stuff goes in <b>bold</b> here</li>

I am currently using the following code, which works in simple cases (no text/tail, etc....):

# strip <p> from <li> elements
liElements = rootNode.findall('.//li')
for elem in liElements:
    para = elem.find(".//p")
    for child in para:
        elem.append(child)
    elem.text = para.text
    elem.remove(para)

There must an easier way to just strip a formatting tag.... I hope?

Looks like you are processing HTML instead; unless it is really XHTML, use a HTML parser. The [BeautifulSoup HTML library](http://www.crummy.com/software/BeautifulSoup/bs4/) has a [`.unwrap()` method](http://www.crummy.com/software/BeautifulSoup/bs4/doc/#unwrap) for just this task. — Martijn Pieters, May 27 '13 at 20:16
My example uses HTML tags but my content is not just HTML. It has mostly custom tags. But I imagine it's all the same to a parser... all of my parsing code (there's a lot) uses ElementTree so I'd like to find a way to use that before converting to a different parsing library — akevan, May 27 '13 at 20:45

Martijn Pieters · Accepted Answer · 2013-05-27T20:51:41.240

4

Perhaps the easiest way is to not use ElementTree to process HTML, but to use BeautifulSoup instead; the library handles unwrapping explicitly through the .unwrap() method:

for elem in soup.find_all('li'):
    for para in elem.find_all('p'):
        para.unwrap()

edited May 27 '13 at 20:51

answered May 27 '13 at 20:44

Martijn Pieters

1,048,767
296
4,058
3,343

Yup, no easier way with ElementTree. I've ported some section of my code over to BeautifulSoup... IMHO it's slightly nicer to use. Pretty close to lxml though. – akevan May 28 '13 at 22:05

Remove parent tag (without removing children) with ElementTree

1 Answers1

Linked