With BeautifulSoup:
html = '<h3>freedom machines.</h3><p>dom.</p><br/><p>The robust display.</p>'
soup = BeautifulSoup(html)
text = soup.find("h3").string
This is a basic use of BeautifulSoup.
Create a BeautifulSoup
object with your string as parameter.
Then use its find
method to find the tag with the name you're looking for.
Finally, get the text the tag surrounds with its string
attribute.
If you know that your text is in a <h1>
, <h2>
or <h3>
but you don't know which, just try all of them.
You can even check the three at once:
tag = soup.find("h1") or soup.find("h2") or soup.find("h3")
text = tag.string
The or
operator will return the first member that evaluates as True
Boolean-wise.
In this case, it means the first soup.find
result that is not None
.
The find
method accepts an iterator as well, so you can pass it a static tuple.
The result will be a tag object (if any) that matches any of the asked types.
tag = soup.find(("h1", "h2", "h3"))
Of course, it is better to know exactly in advance what tag will contain what you want...
If there are both <h1>
and <h2>
tags on the page, you won't know what to do...