I keep getting a traceback error saying AttributeError: 'NoneType' object has no attribute 'startswith'
when I get to the end of my script. What I am doing up to this point is scraping all kinds of different pages then pulling all these different pages into one list that scrapes the final URL for each business page. What I did was go to each_page
and scrape all the 'a'
tags off of the page, then I am wanting to search through them and only keep the ones that start with '/401k/'
. I know I could probably do it without having to add it to another list because I feel like I have too many. I was thinking of doing it like this:
for a in soup.findAll('a'):
href = a.get('href')
if href.startswith('/401k/'):
final_url.append(href)
#Even when I try this I get an error saying that no attribute
Either way it isn't getting the data and I cant figure out what is going on. Maybe I've been looking at the screen too much.
import requests
from bs4 import BeautifulSoup
url = "https://www.brightscope.com/ratings/"
page = requests.get(url)
soup = BeautifulSoup(page.text, 'html.parser')
hrefs = []
ratings = []
pages = []
s_names = []
final_url = []
for href in soup.findAll('a'):
if 'href' in href.attrs:
hrefs.append(href.attrs['href'])
for good_ratings in hrefs:
if good_ratings.startswith('/ratings/'):
ratings.append(url[:-9]+good_ratings)
del ratings[0]
del ratings[27:]
for each_rating in ratings:
page = requests.get(each_rating)
soup = BeautifulSoup(page.text, 'html.parser')
span = soup.find('span', class_='letter-pages')
if soup.find('span', class_='letter-pages'):
for a in span.find_all('a'):
href = a.get('href')
pages.append('https://www.brightscope.com'+href)
else:
pages.append(page.url)
hrefs = []
pages = set(pages)
for each_page in pages:
page = requests.get(each_page)
soup = BeautifulSoup(page.text, 'html.parser')
for a in soup.findAll('a'):
href = a.get('href')
s_names.append(href)
# I am getting a traceback error AttributeError: 'NoneType' object has no attribute 'startswith' starting with the code below.
for each in s_names:
if each.startswith('/401k'):
final_url.append(each)