I am beginning to learn the basics of webscraping with Python, but I am having a little trouble with my code. I am trying to scrape the weather from the front page of 'yahoo.com':
<div class="Ai(c) D(f) Jc(sb) Fz(13px) Py(0) Px(0)">
<div class="D(f) Ai(c) Fld(c)">
<span class="Fw(600) Fz(12px) Mb(10px) C($c-fuji-grey-n) Fz(1em)">Today</span>
<i class="D(b) Bgr(nr) Bgz(ct) Bgp(c) Mb(10px) H(40px) W(40px) wafer-img-loaded" style="background-image: url("https://s.yimg.com/cv/apiv2/200510/w/l/fair_day.png");"></i>
<div class="Fw(600) Fz(12px)">
<span class="C($c-fuji-grey-n) Pend(5px) unit_F">74<span>°</span></span>
<span class="C($c-fuji-grey-o) unit_F">59<span>°</span></span></div></div>
<div class="D(f) Ai(c) Fld(c)">
<span class="Fw(600) Fz(12px) Mb(10px) C($c-fuji-grey-n) Fz(1em)">Wed</span>
<i class="D(b) Bgr(nr) Bgz(ct) Bgp(c) Mb(10px) H(40px) W(40px) wafer-img-loaded" style="background-image: url("https://s.yimg.com/cv/apiv2/200510/w/l/partly_cloudy_day.png");"></i>
<span class="Hidden">Partly cloudy today with a high of 74 °F (23.3 °C) and a low of 51 °F (10.6 °C).</span>
<div class="Fw(600) Fz(12px)">
<span class="C($c-fuji-grey-n) Pend(5px) unit_F">74<span>°</span></span>
<span class="C($c-fuji-grey-o) unit_F">51<span>°</span></span></div></div>
<div class="D(f) Ai(c) Fld(c)"><span class="Fw(600) Fz(12px) Mb(10px) C($c-fuji-grey-n) Fz(1em)">Thu</span>
<i class="D(b) Bgr(nr) Bgz(ct) Bgp(c) Mb(10px) H(40px) W(40px) wafer-img-loaded" style="background-image: url("https://s.yimg.com/cv/apiv2/200510/w/l/partly_cloudy_day.png");"></i>
<span class="Hidden">Partly cloudy today with a high of 84 °F (28.9 °C) and a low of 51 °F (10.6 °C).</span>
<div class="Fw(600) Fz(12px)">
<span class="C($c-fuji-grey-n) Pend(5px) unit_F">84<span>°</span></span>
<span class="C($c-fuji-grey-o) unit_F">51<span>°</span></span></div></div>
<div class="D(f) Ai(c) Fld(c)">
<span class="Fw(600) Fz(12px) Mb(10px) C($c-fuji-grey-n) Fz(1em)">Fri</span>
<i class="D(b) Bgr(nr) Bgz(ct) Bgp(c) Mb(10px) H(40px) W(40px) wafer-img-loaded" style="background-image: url("https://s.yimg.com/cv/apiv2/200510/w/l/scattered_showers_day_night.png");"></i>
<span class="Hidden">Scattered thunderstorms today with a high of 84 °F (28.9 °C) and a low of 65 °F (18.3 °C). There is a 35% chance of precipitation.</span>
<div class="Fw(600) Fz(12px)">
<span class="C($c-fuji-grey-n) Pend(5px) unit_F">84<span>°</span></span>
<span class="C($c-fuji-grey-o) unit_F">65<span>°</span></span></div></div></div>
This is the code I have come up with to try and pull this information:
import requests
from bs4 import BeautifulSoup
r = requests.get('https://www.yahoo.com/')
soup = BeautifulSoup(r.content, 'html.parser')
weatherTable = soup.select_one("div.Ai(c).D(f).Jc(sb).Fz(13px).Py(0).Px(0)")
for row in weatherTable.select("div.D(f).Ai(c).Fld(c)"):
day = row.select_one("span.Fw(600).Fz(12px).Mb(10px).C($c-fuji-grey-n).Fz(1em)").text
dWeather = row.select_one("span.C($c-fuji-grey-n).Pend(5px).unit_F").text
nWeather = row.select_one("span.C($c-fuji-grey-o).unit_F").text
print(day, dWeather, nWeather)
When I try to run my code, I get the following error:
Traceback (most recent call last):
File "C:\Users\smith\eclipse-workspace\Practice\src\DecodeWeb.py", line 9, in <module>
weatherTable = soup.select_one("div.Ai(c).D(f).Jc(sb).Fz(13px).Py(0).Px(0)")
File "C:\Users\smith\AppData\Local\Programs\Python\Python39\lib\bs4\element.py", line 1834, in select_one
value = self.select(selector, namespaces, 1, **kwargs)
File "C:\Users\smith\AppData\Local\Programs\Python\Python39\lib\bs4\element.py", line 1869, in select
results = soupsieve.select(selector, self, namespaces, limit, **kwargs)
File "C:\Users\smith\AppData\Local\Programs\Python\Python39\lib\soupsieve\__init__.py", line 98, in select
return compile(select, namespaces, flags, **kwargs).select(tag, limit)
File "C:\Users\smith\AppData\Local\Programs\Python\Python39\lib\soupsieve\__init__.py", line 62, in compile
return cp._cached_css_compile(pattern, namespaces, custom, flags)
File "C:\Users\smith\AppData\Local\Programs\Python\Python39\lib\soupsieve\css_parser.py", line 211, in _cached_css_compile
CSSParser(pattern, custom=custom_selectors, flags=flags).process_selectors(),
File "C:\Users\smith\AppData\Local\Programs\Python\Python39\lib\soupsieve\css_parser.py", line 1058, in process_selectors
return self.parse_selectors(self.selector_iter(self.pattern), index, flags)
File "C:\Users\smith\AppData\Local\Programs\Python\Python39\lib\soupsieve\css_parser.py", line 909, in parse_selectors
key, m = next(iselector)
File "C:\Users\smith\AppData\Local\Programs\Python\Python39\lib\soupsieve\css_parser.py", line 1051, in selector_iter
raise SelectorSyntaxError(msg, self.pattern, index)
soupsieve.util.SelectorSyntaxError: Invalid character '(' position 6
line 1:
div.Ai(c).D(f).Jc(sb).Fz(13px).Py(0).Px(0)
Do I have to substitute the special characters so that BS4 can read the classnames?