For the first print tag
I am getting a large list of hundreds of <a
tags. For the second print tag
I am getting a list with four <a
tags, not including the ones that I want.
One of the tags that tags that I want is at the end of tags
. After printing all several hundred tags, I am printing the last tag, and that is printing the correct end tag as it should. But then by running another for loop over the same (unchanged) list tags
I am not just getting a different result, but significantly different.
With or without the `print '\n\n\n' the phenomenon is happening, it's just to make the split between the two prints easier for me to see.
What is happening to this list in between the first and second for
loop to cause this problem?
(This code is exactly as I have it in my script. Originally I didn't have the lines from the first for
loop until the empty line, and am doing this to debug the lack of the correct URL from the end result.)
EDIT: Also, here is what is being printed for all the print
statements (only the last section of the first print
within the for
loop):
import urllib
from bs4 import BeautifulSoup
startingList = ['http://www.stowefamilylaw.co.uk/']
for url in startingList:
try:
html = urllib.urlopen(url)
soup = BeautifulSoup(html,'lxml')
tags = soup('a')
for tag in tags:
print tag
print tags[-1]
print '\n\n\n'
for tag in tags:
print tag
if not tag.get('href', None).startswith('..'):
continue
except:
continue
....
<a class="shiftnav-target" href="http://www.stowefamilylaw.co.uk/faq-category/decrees-orders-forms/" itemprop="url">Decrees, Orders & Forms</a>
<a class="shiftnav-target" href="http://www.stowefamilylaw.co.uk/faq-category/international-divorce/" itemprop="url">International Divorce</a>
<a class="shiftnav-target"><i class="fa fa-chevron-left"></i> Back</a>
<a class="shiftnav-target" href="http://www.stowefamilylaw.co.uk/contact/" itemprop="url"><i class="fa fa-phone"></i> Contact</a>
<a class="shiftnav-target" href="http://www.stowefamilylaw.co.uk/contact/" itemprop="url"><i class="fa fa-phone"></i> Contact</a>
<a href="http://www.stowefamilylaw.co.uk/">Stowe Family Law</a>
<a href="#spu-5086" style="color: #fff"><div class="callbackbutton"><i class="fa fa-phone" style="font-size: 16px"></i> Request Callback </div></a>
<a href="#spu-5084" style="color: #fff"><div class="callbackbutton"><i class="fa fa-envelope-o" style="font-size: 16px"></i> Quick Enquiry </div></a>
<a class="ubermenu-responsive-toggle ubermenu-responsive-toggle-main ubermenu-skin-black-white-2 ubermenu-loc-primary" data-ubermenu-target="ubermenu-main-3-primary"><i class="fa fa-bars"></i>Main Menu</a>