Python WebScraper with BS4 and BeautifulSoup

Question

I am trying to create a webscraper with BS4 that will grab a specific date . I was able to build the webscraper but it is pulling the wrong dates.

The trouble I am running into is that they share the same class, I tried by id but i get a return results of []. How else can I specify this date and not others?

import requests
from bs4 import BeautifulSoup

URL = 'https://nemsis.org/state-data-managers/state-map-v3/colorado'
page = requests.get(URL)

soup = BeautifulSoup(page.content, 'html.parser')

date = soup.find('span',class_='state-updated-on')
date = date.text

print(date)

and it returns February 16, 2017 but I am looking for 09/04/2019

can't you use `find_all` to get all values - and later use `index` - ie. `all_values[1]` - to get only one value which you need? — furas, Jan 24 '21 at 03:09

MendelG · Answer 1 · 2021-01-24T01:43:00.487

2

The page is loaded dynamically, therefore requests won't support it. We can Selenium as an alternative to scrape the page.

Install it with: pip install selenium.

Download the correct ChromeDriver from here.

from time import sleep
from selenium import webdriver
from bs4 import BeautifulSoup

URL = "https://nemsis.org/state-data-managers/state-map-v3/colorado"
driver = webdriver.Chrome(r"c:\path\to\chromedriver.exe")
driver.get(URL)
# Wait for the page to fully render
sleep(5)

soup = BeautifulSoup(driver.page_source, "html.parser")
print(soup.find("span", id="commitDate-refs/heads/release-3.4.0-3").text)

driver.close()

Output:

9/4/2019

edited Jan 24 '21 at 01:43

answered Jan 24 '21 at 01:36

MendelG

14,885
4
25
52

To run Selenium in headless mode see: [Running Selenium with Headless Chrome Webdriver](https://stackoverflow.com/questions/53657215/running-selenium-with-headless-chrome-webdriver). – MendelG Jan 24 '21 at 01:41
1

@S.Caruso Glad to help! Consider marking this answer as accepted. – MendelG Jan 24 '21 at 05:09

Python WebScraper with BS4 and BeautifulSoup

1 Answers1