7

How would I use BeautifulSoup to remove only a tag? The method I found deletes the tag and all other tags and content inside it. I want to remove only the tag and leave everything inside it untouched, e.g.

change this:

<div>
<p>dvgbkfbnfd</p>
<div>
<span>dsvdfvd</span>
</div>
<p>fvjdfnvjundf</p>
</div>

to this:

<p>dvgbkfbnfd</p>
<span>dsvdfvd</span>
<p>fvjdfnvjundf</p>
Ricardo Altamirano
  • 14,650
  • 21
  • 72
  • 105
Blainer
  • 2,552
  • 10
  • 32
  • 39
  • 1
    Conceptually, you want to [replace](http://www.crummy.com/software/BeautifulSoup/bs3/documentation.html#Replacing%20one%20Element%20with%20Another) the `
    ` with its contents.
    – jimw May 11 '12 at 17:28
  • +1 to that. The tag includes its contents -- it doesn't make any conceptual sense from the perspective of the parser to delete the tag without deleting the contents (it would be like deleting a directory but not deleting any files inside it). So you need to replace the tag itself with its own contents. – Andrew Gorcester May 11 '12 at 17:34
  • perfect :) this worked, i never thought about it that way. – Blainer May 11 '12 at 17:41
  • possible duplicate of [Remove a tag using BeautifulSoup but keep its contents](http://stackoverflow.com/questions/1765848/remove-a-tag-using-beautifulsoup-but-keep-its-contents) – Mark Longair May 11 '12 at 17:43

1 Answers1

8

I've voted to close as a duplicate, but in case it's of use, reapplying slacy's answer from top related answer on the right gives you this solution:

from BeautifulSoup import BeautifulSoup

html = '''
<div>
<p>dvgbkfbnfd</p>
<div>
<span>dsvdfvd</span>
</div>
<p>fvjdfnvjundf</p>
</div>
'''

soup = BeautifulSoup(html)
for match in soup.findAll('div'):
    match.replaceWithChildren()

print soup

... which produces the output:

<p>dvgbkfbnfd</p>

<span>dsvdfvd</span>

<p>fvjdfnvjundf</p>
Community
  • 1
  • 1
Mark Longair
  • 446,582
  • 72
  • 411
  • 327
  • 2
    Note: as per https://www.crummy.com/software/BeautifulSoup/bs4/doc/#method-names : from BS4 onward, it is best to use "unwrap" rather than "replaceWithChildren". – SylvainD Jun 23 '20 at 15:32