1

I am trying to parse the HTML on this website.

I would like to get the text from all these span elements with class = "post-subject"

Examples:

<span class="post-subject">Set of 20 moving boxes (20009 or 20011)</span>

<span class="post-subject">Firestick/Old xbox games</span>

When I run my code below, soup.find() returns None. I'm not sure what's going on?

import requests
from bs4 import BeautifulSoup


page = requests.get('https://trashnothing.com/washington-dc-freecycle?page=1')
soup = BeautifulSoup(page.text, 'html.parser')

soup.find('span', {'class': 'post-subject'})
max
  • 4,141
  • 5
  • 26
  • 55
  • 1
    If you run `document.querySelector('.post-subject')` on that page, that will also return none. Where do you see elements with a class of `post-subject`? Do you have to run a search or interact with the page first? If so, you'll need to do that before calling BeautifulSoup... – duhaime Jul 26 '18 at 00:49
  • 1
    The page needs a login. Check out `mechanize` or `Selenium` for ways to access webpages a little more interactively. – C. Braun Jul 26 '18 at 00:52

2 Answers2

2

To help you get started the following should load the page you will need to get the correct gecko driver and then can implement with Selenium. I do not see a class: post-subject on that page you linked, but you can automate button clicks for the login as :

availbutton = driver.find_element_by_id('buttonAvailability_1')
availbutton.click()


from bs4 import BeautifulSoup
from selenium import webdriver

driver = webdriver.Firefox()
driver.get('https://trashnothing.com/washington-dc-freecycle?page=1')

html = driver.page_source
soup = BeautifulSoup(html,'lxml')
print(soup.find('span', {'class': 'post-subject'}))
Ian-Fogelman
  • 1,595
  • 1
  • 9
  • 15
1

I had the same issue. Just changed the html.parser to html5lib and boom. It was working then. Also its a good practice to use soup.find_all() instead of soup.find() as the function return more than one object

Tanveer Jan
  • 61
  • 2
  • 2
  • 11