Scraping Data using python 3.x beautiful soup and urrllib.request

Question

I just started python recently and as a project ive been asked to learn how to scrape data from websites but im confused because im even newer to html, so when i am doing this in python

price_box = soup.find('div', attrs={'class':'price'})

i dont see where the class name is shown as simply 'price' for the price of a stock on https://www.bloomberg.com/quote/SPX:IND.

to me, the class is defined as follows

span class="priceText__1853e8a5">2,711.66

can someone explain to me what im missing or where my mistake is?

EDIT: ive been using this website to help, and i just copied the code across and it works, but when i inspect element to see for myself i dont see whats happening.

https://medium.freecodecamp.org/how-to-scrape-websites-with-python-and-beautifulsoup-5946935d93fe

score 2 · Accepted Answer · answered Jun 29 '18 at 18:40

The specific answer to your question is that you can use class_='className' instead of matching the tag divs and class attrs. The problem in your code is that the HTML tag class is 'priceText__1853e8a5' not 'price'

Since the content on the webpage that you want to scrape is not static, and instead populated by a server-side or client-side script, you must let the information populate before scraping the page. I used Selenium for this and added time.sleep(5) to wait 5 seconds for the information to load. Then I used browser.page_source to get the source for the page. Finally, I was able to get that page source into a soup and find the tag in the soup to pull the text from it.

As a side note, the reason I used find_all() instead of find() was because I didn't know if there would be more tags with the same class.

from selenium import webdriver
from bs4 import BeautifulSoup
import time
browser = webdriver.Chrome(executable_path=r"C:\\Program Files (x86)\\Microsoft Visual Studio\\Shared\\Python36_64\\selenium\\webdriver\\chromedriver_win32\\chromedriver.exe")
# above is my path to chromedriver, replace it with your own.
browser.maximize_window()
browser.get('https://www.bloomberg.com/quote/SPX:IND')
time.sleep(5) # wait 5 seconds for the page to load the js
pageSource = browser.page_source
soup = BeautifulSoup(pageSource, 'html.parser')
prices = soup.find_all(class_='priceText__1853e8a5')
price = prices[0].text
print(price)

ok that makes sense. i was confused cause on the website tutorial it searched for price, and it worked and pulled all the info, but when i looked in the inspect element, it said priceText_1853,,, rather than price — bobdaves69, Jun 30 '18 at 19:59

Scraping Data using python 3.x beautiful soup and urrllib.request

1 Answers1