I just discovered Beautiful Soup, which seem very powerful. I'm wondering if there is an easy way to extract the "alt" field with the text. A simple example would be
from bs4 import BeautifulSoup
html_doc ="""
<body>
<p>Among the different sections of the orchestra you will find:</p>
<p>A <img src="07fg03-violin.jpg" alt="violin" /> in the strings</p>
<p>A <img src="07fg03-trumpet.jpg" alt="trumpet" /> in the brass</p>
<p>A <img src="07fg03-woodwinds.jpg" alt="clarinet and saxophone"/> in the woodwinds</p>
</body>
"""
soup = BeautifulSoup(html_doc, 'html.parser')
print(soup.get_text())
This would result in
Among the different sections of the orchestra you will find:
A in the strings
A in the brass
A in the woodwinds
But I would like to have the alt field inside the text extraction, which would give
Among the different sections of the orchestra you will find:
A violin in the strings
A trumpet in the brass
A clarinet and saxophone in the woodwinds
Thanks