Cannot `find_all` when extracting elements from webpage

Question

I am trying to extract the honour awards from every player in the premier league table, however when I try to run this in a loop over many html links I get the following error:

AttributeError: 'NoneType' object has no attribute 'find_all'

I have found that this may be related the link not have the proposed elements I am searching for - However after printing the variable storing this information:

#example
None
None
None
None
<table class="honoursAwards">
<tbody>
<tr>
<th>Premier League Champion</th>
<th class="u-text-right">1</th>
</tr>
<tr>
<td colspan="3">
<table class="playerSidebarNested">
<tbody>
<tr>
<td>2019/20</td>
</tr>
</tbody>
</table>
</td>
</tr>
</tbody>
</table>

I a guessing that the None values trip the collection from occurring?

Here's what I have done:

base = 'https://www.premierleague.com/players/{}/'
link = 'https://footballapi.pulselive.com/football/players'
payload = {
    'pageSize': '30',
    'compSeasons': '418',
    'altIds': 'true',
    'page': 0,
    'type': 'player',
    'id': '-1',
    'compSeasonId': '418'
}

football_pages = []

with requests.Session() as s:
    s.headers['User-Agent'] = 'Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.124 Safari/537.36'
    s.headers['referer'] = 'https://www.premierleague.com/'
    s.headers['origin'] = 'https://www.premierleague.com'


    while True:
        res = s.get(link,params=payload)
        if not res.json()['content']:break
        for item in res.json()['content']:
           football_pages.append(base.format(int(item['id'])))
        
        payload['page']+=1


for urls in football_pages:
    pages = requests.get(urls)
    soup_2 =BeautifulSoup(pages.content, 'lxml')
    table = soup_2.find('table', {'class': 'honoursAwards'})
    tables=[item.text for item in table.find_all('th')] #error is here
    print(tables)

What happened when you tried manually visiting each of the web pages specified in `football_pages`, and verifying that all of them have a `` element? What do you expect to happen if you try to `.find_all` from this table if it doesn't exist on the page? What happened when you tried verifying the list of URLs that you get in `football_pages`, itself? — Karl Knechtel, Jul 19 '21 at 08:11
"I a guessing that the None values trip the collection from occurring?" What happened when you tried to write code based on this assumption? For example, did you consider writing code that checks `if` the value in question `is None`, and doing something different in that case? — Karl Knechtel, Jul 19 '21 at 08:11

score 1 · Accepted Answer · answered Jul 19 '21 at 08:39

Not every player will have an awards table. You can simply use a try/except.

I'd also just use pandas to parse the table:

import requests
import pandas as pd

base = 'https://www.premierleague.com/players/{}/'
link = 'https://footballapi.pulselive.com/football/players'
payload = {
    'pageSize': '30',
    'compSeasons': '418',
    'altIds': 'true',
    'page': 0,
    'type': 'player',
    'id': '-1',
    'compSeasonId': '418'
}

football_pages = []

with requests.Session() as s:
    s.headers['User-Agent'] = 'Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.124 Safari/537.36'
    s.headers['referer'] = 'https://www.premierleague.com/'
    s.headers['origin'] = 'https://www.premierleague.com'


    while True:
        res = s.get(link,params=payload)
        if not res.json()['content']:break
        for item in res.json()['content']:
           football_pages.append(base.format(int(item['id'])))
        
        payload['page']+=1


for urls in football_pages:
    pages = requests.get(urls)
    try:
        table = pd.read_html(pages.text, attrs = {'class': 'honoursAwards'})[0]
        print(table)
    except Exception as e:
        print(e)

imxitiz · Answer 2 · 2021-07-19T08:26:53.850

0

Doing just this should work for you:

if table is not None: # or if table:
    tables=[item.text for item in table.find_all('th')] #error is here
    print(tables)

We are checking everytime if table is None or not, if table is None then don't do anything else work as usual.

edited Jul 19 '21 at 08:26

answered Jul 19 '21 at 08:09

imxitiz

3,920
3
9
33

In Python, we [idiomatically](https://stackoverflow.com/questions/3257919/what-is-the-difference-between-is-none-and-none) check for `None` with `is`. – Karl Knechtel Jul 19 '21 at 08:12
@KarlKnechtel thankyou I will edit my answer. – imxitiz Jul 19 '21 at 08:25

score 0 · Answer 3 · answered Jul 19 '21 at 08:22

A bit cleaner than the answer above:

The if condition in Python checks whether the result of its evaluation is positive or not None. The 'NoneType' that you received in the error points exactly at that, that there is no table for one of the iterations, hence the evaluation returns None which is a NoneType.

if table: 
tables=[item.text for item in table.find_all('th')]

Cannot `find_all` when extracting elements from webpage

3 Answers3