How to scrape using html class name?

Question

I'm trying to understand the logic of scraping and how to find the searched data. I'm trying to scrape the website below:

When I used the code below:

import requests
from bs4 import BeautifulSoup

page = requests.get('https://fr.finance.yahoo.com/')
soup = BeautifulSoup(page.text,'html.parser')

y = soup.find(id="market-summary")
print(y)

The result is what I'm looking for.

However, when I try to replicate the results using the code:

import requests
from bs4 import BeautifulSoup

page = requests.get('https://fr.finance.yahoo.com/')
soup = BeautifulSoup(page.text,'html.parser')

x = soup.find("div", class_= 'Whs(nw) D(ib) Bgc($lv2BgColor) W(100%) Bxz(bb)')
print(x)

I get "None" as a result. Could someone please explain what am I doing wrong? How can I use "class" as a tag in order to find the data I'm looking for?

This [post](https://stackoverflow.com/questions/41687476/using-beautiful-soup-to-find-specific-class) reports the same issue with a different site and suggests the problem is the sites HTML is not "well-formed". This is verified for this site by running its source page through an [HTML validator](https://validator.w3.org/#validate_by_input+with_options) (i.e. copied the source since via URL option doesn't work). However, none of the remedies in the answers worked for this site. — DarrylG, May 24 '20 at 13:59
there's a GDPR cookie request for the site. those sometimes interfere with scraping. see https://stackoverflow.com/questions/57462036/how-can-i-bypass-a-cookie-agreement-page-while-web-scraping-using-python — JL Peyret, May 24 '20 at 18:23
Also, when trying to use CSS selector for those classes, I get `SyntaxError: Document.querySelector: '.Whs(nw).D(ib).Bgc($lv2BgColor).W(100%).Bxz(bb)' is not a valid selector utils.js:332:11 ` so I suspect that this is somewhat deliberate and that those classnames may need escaping, at least in `soup.select` rather than `soup.find`. — JL Peyret, May 24 '20 at 18:28

Pulkit Bansal · Answer 1 · 2020-05-24T12:53:11.040

0

Try:

x = soup.find("div",attrs={"class":"Whs(nw) D(ib) Bgc($lv2BgColor) W(100%) Bxz(bb)"})

edited May 24 '20 at 12:53

answered May 24 '20 at 12:47

Pulkit Bansal

106
6

4

consider revising the answer by providing a detailed explanation on what the code does and how it answers the OP Q? – mnm May 24 '20 at 13:06
1

Thanks for your reply. However, the code provided results also in "None". – AATU May 24 '20 at 13:11

How to scrape using html class name?

1 Answers1