beautifulsoup: can't extract all the elements in one loop

Question

Code:

from bs4 import BeautifulSoup
soup = BeautifulSoup('<div><p>p_string</p><div>div_string</div></div>')
for m in soup.div:
    print "extract(first loop): ", m.extract()
print "current soup.div(frist loop): ", soup.div #it contains another div block
print '___________________________________________________________'

#I have to do another for loop to purge the remaining div block, why?
for m in soup.div:
    print "extract(second loop): ", m.extract()

print "current soup.div(second loop): ", soup.div #removed

Result:

extract(first loop):  <p>p_string</p>
current soup.div(frist loop):  <div><div>div_string</div></div>
___________________________________________________________
extract(second loop):  <div>div_string</div>
current soup.div(second loop):  <div></div>

Why didn't it extract all elements(p and div) in the first for loop?

score 1 · Accepted Answer · edited May 23 '17 at 11:49

1

This is because you are calling extract() in the loop which removes a tag from a tree - removing the tag's children while iterating over them. It is basically the same as iterating over the list and remove items from it in the loop.

Instead, use .find_all():

for m in soup.div.find_all():
    print m.extract()

edited May 23 '17 at 11:49

Community

1
1

answered Oct 30 '14 at 06:51

alecxe

462,703
120
1,088
1,195

beautifulsoup: can't extract all the elements in one loop

1 Answers1