0

I have been trying to scrape some HTML and extract certain texts from it.

The HTML has tags that are empty or tags that only contain whitespace.

How can I get rid of all those tags from my tree? I am using beautiful soup and python.

Karl Taylor
  • 4,839
  • 3
  • 34
  • 62

1 Answers1

0

You can use decompose() function to do this.

markup = '<a href="http://example.com/">I linked to <i>example.com</i></a>'
soup = BeautifulSoup(markup)
a_tag = soup.a

soup.i.decompose()

a_tag
# <a href="http://example.com/">I linked to</a>

You will need to loop over the tags though and find out the tags that have empty content and then use the function above to delete it from your tree.

Rafael
  • 7,002
  • 5
  • 43
  • 52