2

I'm trying to understand the logic of scraping and how to find the searched data. I'm trying to scrape the website below:

website

When I used the code below:

import requests
from bs4 import BeautifulSoup

page = requests.get('https://fr.finance.yahoo.com/')
soup = BeautifulSoup(page.text,'html.parser')

y = soup.find(id="market-summary")
print(y)

The result is what I'm looking for.

However, when I try to replicate the results using the code:

import requests
from bs4 import BeautifulSoup

page = requests.get('https://fr.finance.yahoo.com/')
soup = BeautifulSoup(page.text,'html.parser')

x = soup.find("div", class_= 'Whs(nw) D(ib) Bgc($lv2BgColor) W(100%) Bxz(bb)')
print(x)

I get "None" as a result. Could someone please explain what am I doing wrong? How can I use "class" as a tag in order to find the data I'm looking for?

martineau
  • 119,623
  • 25
  • 170
  • 301
AATU
  • 87
  • 1
  • 9
  • This [post](https://stackoverflow.com/questions/41687476/using-beautiful-soup-to-find-specific-class) reports the same issue with a different site and suggests the problem is the sites HTML is not "well-formed". This is verified for this site by running its source page through an [HTML validator](https://validator.w3.org/#validate_by_input+with_options) (i.e. copied the source since via URL option doesn't work). However, none of the remedies in the answers worked for this site. – DarrylG May 24 '20 at 13:59
  • there's a GDPR cookie request for the site. those sometimes interfere with scraping. see https://stackoverflow.com/questions/57462036/how-can-i-bypass-a-cookie-agreement-page-while-web-scraping-using-python – JL Peyret May 24 '20 at 18:23
  • Also, when trying to use CSS selector for those classes, I get `SyntaxError: Document.querySelector: '.Whs(nw).D(ib).Bgc($lv2BgColor).W(100%).Bxz(bb)' is not a valid selector utils.js:332:11 ` so I suspect that this is somewhat deliberate and that those classnames may need escaping, at least in `soup.select` rather than `soup.find`. – JL Peyret May 24 '20 at 18:28

1 Answers1

0

Try:

x = soup.find("div",attrs={"class":"Whs(nw) D(ib) Bgc($lv2BgColor) W(100%) Bxz(bb)"})
  • 4
    consider revising the answer by providing a detailed explanation on what the code does and how it answers the OP Q? – mnm May 24 '20 at 13:06
  • 1
    Thanks for your reply. However, the code provided results also in "None". – AATU May 24 '20 at 13:11