Good morning everyone
I have recently started using BeautifulSoup and watching videos and reading up on it with the intention of web scraping share price information daily and add this to a previously populated csv file that contains historical share prices.
I have tried several amendments to my code (below) and regardless of whether I use the "div" or "span" element and then add the full class name - I end up receiving empty brackets "[]" as my print in the console.
The site I was using is yahoo finance - so I tried using another site, Sharenet, also the same problem. I then tried scraping another part of the website (the share name only) - also empty brackets. The only time I receive a result is when I scrape a "div" that has several items nested within it - in the printout there I can see the share price info but surely there is a way to get ONLY the price?
I have been using the following video on youtube as a guide which has been very helpful along with a previous post on here with a similiar problem but I still get problems.
https://www.youtube.com/watch?v=XQgXKtPSzUI
import yahoo finance stock price with beautifulsoup and request
Below is my code (I am using python 2.7):
import urllib2
from bs4 import BeautifulSoup as soup
#Opens the connection and downloads the webpage
kio_site = urllib2.urlopen("https://finance.yahoo.com/quote/KIO.JO?p=KIO.JO")
#This will print all the html on the webpage
kio_html = kio_site.read()
#Now closing the internet connection that you opened before
kio_site.close()
#now you want to parse the html file
page_soup = soup(kio_html, "html.parser")
#Specifically find certain elements
kio_info = page_soup.find_all("span", {"class":"Trsdu(0.3s) Fw(b) Fz(36px) Mb(-4px) D(ib)"})
print kio_info
When I instead use the following code, I get a result but the share price is within all the mess:
kio_info = page_soup.find_all("div", {"class":"My(6px) smartphone_Mt(15px)"})
Within the printout I also saw that there is a "data-reactid"="14" just before the share price number but even when I included this within my code (along with the "span" and "class" "Trsdu(0.3s)" etc.) it also did not give me the price.
Could it be that the way I am reading the webpage should not be in html? I tried using lxml but got an error.
Thank you in advance for any help!