Scraping with Python bs4 text extract out of HTML

Question

I'm trying to extract the value from <div class="number"> as seen in the below image, but the output returns None, how do I go about getting that value?

The HTML:

HTML i am trying to extract is attached here

The code I have already tried:

from urllib.request import urlopen as uReq
from bs4 import BeautifulSoup as soup
from pylogix import PLC   

my_url = 'https://www.aeso.ca/'
uClient =  uReq(my_url)
page_html = uClient.read()
uClient.close()
page_soup = soup(page_html, "html.parser")
report = page_soup.findAll("div",{"class":"number"})

print(report)

MendelG · Accepted Answer · 2020-12-08T22:13:28.183

The website is loaded dynamically, so requests doesn't support it. We can use Selenium as an alternative to scrape the page.

Install it with: pip install selenium.

Download the correct ChromeDriver from here.

from time import sleep
from selenium import webdriver
from bs4 import BeautifulSoup


URL = "https://www.aeso.ca/"
driver = webdriver.Chrome(r"c:\path\to\chromedriver.exe")

driver.get(URL)
# Wait for the page to fully render before parsing it
sleep(5)

# The source of the page is in the `page_source` method of the `driver`
soup = BeautifulSoup(driver.page_source, "html.parser")
driver.quit()

report = soup.find_all("div", {"class": "number"})
print(report)

Output:

[<div class="number">10421 <span class="unit">MW</span></div>, <div class="number">$37.57 <span class="unit">/ MWh</span></div>]

To only get the text, call the .text method:

for tag in report:
print(tag.text)

Output:

10421 MW
$37.57 / MWh

To only get the output for "Pool price", use a CSS Selector:

print(soup.select_one(".chart-price div.number").text)

# Or uncomment this to only extract the price, and remove `/ MWh` from the output
# print(soup.select_one(".chart-price div.number").text.split("/")[0])

Output (Currently):

$37.57 / MWh

Works great thank you very much. is there anyway not to show the web page when the script runs? — Boonz, Dec 09 '20 at 19:06
@Boonz You can run the browser in headless mode. See [this](https://stackoverflow.com/questions/53657215/running-selenium-with-headless-chrome-webdriver) post. — MendelG, Dec 09 '20 at 21:14

Scraping with Python bs4 text extract out of HTML

1 Answers1