0

I am trying to create a webscraper with BS4 that will grab a specific date . I was able to build the webscraper but it is pulling the wrong dates.

The trouble I am running into is that they share the same class, I tried by id but i get a return results of []. How else can I specify this date and not others?

import requests
from bs4 import BeautifulSoup

URL = 'https://nemsis.org/state-data-managers/state-map-v3/colorado'
page = requests.get(URL)

soup = BeautifulSoup(page.content, 'html.parser')

date = soup.find('span',class_='state-updated-on')
date = date.text

print(date)  

and it returns February 16, 2017 but I am looking for 09/04/2019

enter image description here

S. Caruso
  • 31
  • 1
  • 7

1 Answers1

2

The page is loaded dynamically, therefore requests won't support it. We can Selenium as an alternative to scrape the page.

Install it with: pip install selenium.

Download the correct ChromeDriver from here.

from time import sleep
from selenium import webdriver
from bs4 import BeautifulSoup

URL = "https://nemsis.org/state-data-managers/state-map-v3/colorado"
driver = webdriver.Chrome(r"c:\path\to\chromedriver.exe")
driver.get(URL)
# Wait for the page to fully render
sleep(5)

soup = BeautifulSoup(driver.page_source, "html.parser")
print(soup.find("span", id="commitDate-refs/heads/release-3.4.0-3").text)

driver.close()

Output:

9/4/2019
MendelG
  • 14,885
  • 4
  • 25
  • 52
  • To run Selenium in headless mode see: [Running Selenium with Headless Chrome Webdriver](https://stackoverflow.com/questions/53657215/running-selenium-with-headless-chrome-webdriver). – MendelG Jan 24 '21 at 01:41
  • 1
    @S.Caruso Glad to help! Consider marking this answer as accepted. – MendelG Jan 24 '21 at 05:09