4

I am using BeautifulSoup to get the price information on second hand Iphone7 smartphones. After I use the bs4 library to get the relevant html and use BeautifulSoup to create an object called 'soup', I search for each product's information by using the following code:

soup = BeautifulSoup(html,'html5lib')
products = soup.find_all('div', class_='feed-item sc-gqjmRU igneJk')

This does give me the correct html information about each product. Now I go one layer deeper to see the descriptions for each product:

descriptions = [x.find('p', class_='sc-kAzzGY kZncUf') for x in products]

The code given above works fine. However this is not the description I want. It is a rather crude form of what I am looking for. To get only the descriptions I mentioned I need to write something like this (.getText() needs to be added):

descriptions = [x.find('p', class_='sc-kAzzGY kZncUf').getText() for x in products]

This gives me the following error:

----> 1 descriptions = [x.find('p', class_='sc-kAzzGY kZncUf').getText() for x in products]

AttributeError: 'NoneType' object has no attribute 'getText'

However, the code below works fine:

descriptions = [x.find('p', class_='sc-kAzzGY kZncUf') for x in products]
descriptions[0].getText()

descriptions[0] should be the same thing as the value of x.find('p', class_='sc-kAzzGY kZncUf') that we get from the first iteration.

My question is: Since they should give the same value (both x.find and description[0]), why one gives an error whereas the other works?

Thank you in advance

Doga
  • 63
  • 1
  • 5

2 Answers2

2

It just means that one of the products does not have an element matching .find('p', class_='sc-kAzzGY kZncUf') search criteria.

You could add this extra check and do:

for product in products:
    description_element = product.find('p', class_='sc-kAzzGY kZncUf')
    description = description_element.get_text() if description_element else "No Description"

    print(description)
alecxe
  • 462,703
  • 120
  • 1,088
  • 1,195
1

When you run descriptions = [x.find('p', class_='sc-kAzzGY kZncUf').getText() for x in products], this uses getText() on every instance of x.find('p', class_='sc-kAzzGY kZncUf'). All it takes is one of these to be None for the whole iteration to not work. When you create the descriptions list as descriptions = [x.find('p', class_='sc-kAzzGY kZncUf') for x in products], there is more than one element in this list, where the first one is not None but at least one of the other elements is.

Bill M.
  • 1,388
  • 1
  • 8
  • 16