How to extract text from span surrounded by div using beautifulsoup

Question

I have a html snippet as below:

<div class="single_baby_name_description">
    <label>Meaning :</label> <span class="28816-meaning">the meaning of this name is universal whole.</span> </br>
    <label>Gender :</label> <span class="28816-gender">Girl</span> </br>
    <label>Religion :</label> <span class="28816-religion">Christianity</span> </br>
    <label>Origin :</label> <span class="28816-origin">German,French,Swedish</span> </br>
</div>

I attempt to extract text from all span inside div using

soup = BeautifulSoup(html,'html.parser')
spans=soup.select('div.single_baby_name_description>span')

But spans[0].text gets only the text from the first tag . And spans[1].text occurs IndexError: list index out of range.

Any help would be greatly appreciated.

teller.py3 · Accepted Answer · 2018-10-06T00:55:20.470

1

I found out that only 'lxml' will do the job. For some reason 'html.parser' won't.

This will work:

soup = BeautifulSoup(html, 'lxml')
spans = soup.select('div.single_baby_name_description span')
spans = [span.text for span in spans]
print(spans)

Output:

['the meaning of this name is universal whole.', 'Girl', 'Christianity', 'German,French,Swedish']

edited Oct 06 '18 at 00:55

answered Oct 06 '18 at 00:45

teller.py3

822
8
22

1

Thanks，I switch to another parser 'lxml' jus now. It works. – Kevin Auds Oct 06 '18 at 00:54
I recommend you using lxml over html.parser anyway. By the way, see edit on my answer. – teller.py3 Oct 06 '18 at 00:57

score 0 · Answer 2 · answered Oct 06 '18 at 00:03

0

looking at the beautiful soup docs

https://www.crummy.com/software/BeautifulSoup/bs4/doc/#beautifulsoup

selecting an attribute by tag name just returns the first one found as you’ve described. Have you tried:

Soup.find_all(‘span’)

answered Oct 06 '18 at 00:03

Luke Corbett

16
4

How to extract text from span surrounded by div using beautifulsoup

2 Answers2

Linked