I am trying to extract the honour awards from every player in the premier league table, however when I try to run this in a loop over many html links I get the following error:
AttributeError: 'NoneType' object has no attribute 'find_all'
I have found that this may be related the link not have the proposed elements I am searching for - However after printing the variable storing this information:
#example
None
None
None
None
<table class="honoursAwards">
<tbody>
<tr>
<th>Premier League Champion</th>
<th class="u-text-right">1</th>
</tr>
<tr>
<td colspan="3">
<table class="playerSidebarNested">
<tbody>
<tr>
<td>2019/20</td>
</tr>
</tbody>
</table>
</td>
</tr>
</tbody>
</table>
I a guessing that the None
values trip the collection from occurring?
Here's what I have done:
base = 'https://www.premierleague.com/players/{}/'
link = 'https://footballapi.pulselive.com/football/players'
payload = {
'pageSize': '30',
'compSeasons': '418',
'altIds': 'true',
'page': 0,
'type': 'player',
'id': '-1',
'compSeasonId': '418'
}
football_pages = []
with requests.Session() as s:
s.headers['User-Agent'] = 'Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.124 Safari/537.36'
s.headers['referer'] = 'https://www.premierleague.com/'
s.headers['origin'] = 'https://www.premierleague.com'
while True:
res = s.get(link,params=payload)
if not res.json()['content']:break
for item in res.json()['content']:
football_pages.append(base.format(int(item['id'])))
payload['page']+=1
for urls in football_pages:
pages = requests.get(urls)
soup_2 =BeautifulSoup(pages.content, 'lxml')
table = soup_2.find('table', {'class': 'honoursAwards'})
tables=[item.text for item in table.find_all('th')] #error is here
print(tables)