0

I have a html snippet as below:

<div class="single_baby_name_description">
    <label>Meaning :</label> <span class="28816-meaning">the meaning of this name is universal whole.</span> </br>
    <label>Gender :</label> <span class="28816-gender">Girl</span> </br>
    <label>Religion :</label> <span class="28816-religion">Christianity</span> </br>
    <label>Origin :</label> <span class="28816-origin">German,French,Swedish</span> </br>
</div>

I attempt to extract text from all span inside div using

soup = BeautifulSoup(html,'html.parser')
spans=soup.select('div.single_baby_name_description>span') 

But spans[0].text gets only the text from the first tag . And spans[1].text occurs IndexError: list index out of range.

Any help would be greatly appreciated.

Kevin Auds
  • 174
  • 1
  • 4
  • 15

2 Answers2

1

I found out that only 'lxml' will do the job. For some reason 'html.parser' won't.

This will work:

soup = BeautifulSoup(html, 'lxml')
spans = soup.select('div.single_baby_name_description span')
spans = [span.text for span in spans]
print(spans)

Output:

['the meaning of this name is universal whole.', 'Girl', 'Christianity', 'German,French,Swedish']
teller.py3
  • 822
  • 8
  • 22
0

looking at the beautiful soup docs

https://www.crummy.com/software/BeautifulSoup/bs4/doc/#beautifulsoup

selecting an attribute by tag name just returns the first one found as you’ve described. Have you tried:

Soup.find_all(‘span’)