Beautiful Soup: 'NoneType' object has no attribute 'text'

Question

I got this code to work to scrape a table on a webpage, which I'm very happy with. However, on a rare occasion, a title might miss a 'genre' or an 'image URL' field. As soon as the scraper hits an item in the list that has a missing value it discontinues and gives me the 'NoneType' object has no attribute 'text' error.

How can I amend this code for it to continue scraping and just pass a N/A value for that specific column if a value is missing.

Your help is much appreciated!

from bs4 import BeautifulSoup
import pandas as pd

# Send a GET request to the URL
url = "https://www.hebban.nl/rank"
response = requests.get(url,headers={'user-agent':'Mozilla/5.0'})

# Parse the HTML content using Beautiful Soup
soup = BeautifulSoup(response.content, 'html.parser')

# Find the book titles, authors, and image url links
data = []
books = soup.find_all('div', class_='item')
for book in books:
    rank = book.h3.text.strip()
    title = book.find('a', class_='neutral').text.strip()
    author = book.find('span', class_='author').text.strip()
    genre = book.find('a', class_='btn btn4 yf-genre').text.strip()

    ##img_url = book.img.get('data-src')
    
    print(rank + ' by ' + author)
    ##print('Image URL: ' + img_url)

    data.append({'rank': rank, 'author': author, 'title': title, 'genres': genre})

# Create a dataframe and save it to a csv
df = pd.DataFrame (data)
df.to_csv('hebbanexport.csv', index=False)

HedgeHog · Accepted Answer · 2023-03-17T11:19:32.867

1

Simply check if element you try to find is available, before apply any method, if not set to None or any value you like to use.

genre = book.find('a', class_='btn btn4 yf-genre').text.strip() if book.find('a', class_='btn btn4 yf-genre') else None

You could also use a function to check:

def check_if_element_is_available(e):
    if e:
        return e.text.strip()
    else:
        return None

for book in books:
    rank = check_if_element_is_available(book.h3)
    title = check_if_element_is_available(book.find('a', class_='neutral'))
    author = check_if_element_is_available(book.find('span', class_='author'))
    genre = check_if_element_is_available(book.find('a', class_='btn btn4 yf-genre'))

edited Mar 17 '23 at 11:19

answered Mar 17 '23 at 11:14

HedgeHog

22,146
4
14
36

In case you use the function approach I would suggest to use a `try` `catch AttributeError` – Jorge Luis Mar 17 '23 at 11:20
@JorgeLuis with focus to exceptions only, good point - https://stackoverflow.com/questions/1835756/using-try-vs-if-in-python – HedgeHog Mar 17 '23 at 11:45

Beautiful Soup: 'NoneType' object has no attribute 'text'

1 Answers1