I have been trying to scrape some HTML and extract certain texts from it.
The HTML has tags that are empty or tags that only contain whitespace.
How can I get rid of all those tags from my tree? I am using beautiful soup and python.
I have been trying to scrape some HTML and extract certain texts from it.
The HTML has tags that are empty or tags that only contain whitespace.
How can I get rid of all those tags from my tree? I am using beautiful soup and python.
You can use decompose()
function to do this.
markup = '<a href="http://example.com/">I linked to <i>example.com</i></a>'
soup = BeautifulSoup(markup)
a_tag = soup.a
soup.i.decompose()
a_tag
# <a href="http://example.com/">I linked to</a>
You will need to loop over the tags though and find out the tags that have empty content and then use the function above to delete it from your tree.