0

I'm trying to extract an element from this site. More specifically, I am trying to extract the temperature.

This is the following element I am attempting to extract using BeautifulSoup4:

<p class="temperature">-1<span>°C</span></p>

The following is my python code that is supposed to extract the element from the mentioned site:

import requests
from bs4 import BeautifulSoup

url = requests.get('https://www.theweathernetwork.com/ca/weather/ontario/mississauga')

soup = BeautifulSoup(url.content, 'lxml')
 
print(soup.find_all('p', {'class':'temperature'}))

And it just returns an empty array.

[]

I would be really appreciative if anyone could help me with this.

Note: I am new to python

Community
  • 1
  • 1
  • 1
    The detail you want is loaded via javascript so python-requests is not enough. It's coming out as empty because it **is** empty. What you're doing is web scraping. http://stackoverflow.com/questions/26393231/using-python-requests-with-javascript-pages – munsu Mar 17 '17 at 01:56
  • I see. So what library do you recommend I use to extract the data? – Curious Spider Mar 17 '17 at 02:04

1 Answers1

0

Okay, so as @RobinAnupol mentioned, you have several options depending on how similar you want to be to a real browser.

  1. Open the website manually on a browser and observe the api calls the site does with javascript code. Replicate them using requests in python

  2. Use a javascript rendering engine like splash

  3. Use selenium with a real browser (there drivers for chrome, ie, firefox, phantomjs etc)

Giannis Spiliopoulos
  • 2,628
  • 18
  • 27
  • I just tested it out with selenium, and it works just like planned, it is lower compared to requests, however that could be because the text I am trying to extract is in javascript and not in HTML. – Curious Spider Mar 17 '17 at 02:35
  • That's great. If you want accept this answer so that the question doesn't appear as unanswered – Giannis Spiliopoulos Mar 17 '17 at 02:37