BeautifulSoup Number Extraction

Question

``So im trying to get the degrees from this weather site. But i t keeps returning a blank answer. This is my code Link to a screenshot

import requests
from bs4 import BeautifulSoup

# -----------------------------get site info------------------------------- #


URL = "https://www.theweathernetwork.com/ca/hourly-weather-forecast/ontario/oakville"
request = requests.get(URL)
# print(request.content)

# ----------------------parse site info---------------- #

soup = BeautifulSoup(request.content, 'html5lib')

#print(soup.prettify().encode("utf-8"))

weatherdata = soup.find('span', class_='temp')

print(weatherdata)

Welcome to Stack Overflow. Please copy/paste your code rather than attach a picture. — Nicolas Gervais, Nov 04 '19 at 23:16

Rithin Chalumuri · Accepted Answer · 2019-11-04T23:39:31.503

It might be that those values are rendered dynamically i.e. the values might be populated by javascript in the page.

requests.get() simply returns the markup received from the server without any further client-side changes so it's not fully about waiting.

You could perhaps use Selenium Chrome Webdriver to load the page URL and get the page source. (Or you can use Firefox driver).

Go to chrome://settings/help check your current chrome version and download the driver for that version from here. Make sure to either keep the driver file in your PATH or the same folder where your python script is.

Try this:

from bs4 import BeautifulSoup as bs
from selenium.webdriver import Chrome # pip install selenium
from selenium.webdriver.chrome.options import Options

url = "https://www.theweathernetwork.com/ca/hourly-weather-forecast/ontario/oakville"

#Make it headless i.e. run in backgroud without opening chrome window
chrome_options = Options()  
chrome_options.add_argument("--headless")

# use Chrome to get page with javascript generated content
with Chrome(executable_path="./chromedriver", options=chrome_options) as browser:
     browser.get(url)
     page_source = browser.page_source

#Parse the final page source
soup = bs(page_source, 'html.parser')

weatherdata = soup.find('span', class_='temp')

print(weatherdata.text)

References:

Get page generated with Javascript in Python

selenium - chromedriver executable needs to be in PATH

still doesnt return the value. Im looking for the temperature — Noah Hamilton, Nov 04 '19 at 23:34
@NoahHamilton, Glad it worked, could you please help accept the answer? :) thanks — Rithin Chalumuri, Nov 05 '19 at 00:17

score 0 · Answer 2 · answered Nov 04 '19 at 23:36

Problem seems to be that the data is loaded via JavaScript so it takes a while to load the value for that specific span. When you do your request it seems to be empty and only loads in after a bit. One possible solution to this would be using selenium to wait for the page to load and then extract html afterwards.

from bs4 import BeautifulSoup
from selenium import webdriver

url = "https://www.theweathernetwork.com/ca/hourly-weather-forecast/ontario/oakville"
browser = webdriver.Chrome()
browser.get(url)
html = browser.page_source

soup = BeautifulSoup(html, 'html.parser')
elem = soup.find('span', class_='temp')

print(elem.text)

BeautifulSoup Number Extraction

2 Answers2