0

Good morning everyone

I have recently started using BeautifulSoup and watching videos and reading up on it with the intention of web scraping share price information daily and add this to a previously populated csv file that contains historical share prices.

I have tried several amendments to my code (below) and regardless of whether I use the "div" or "span" element and then add the full class name - I end up receiving empty brackets "[]" as my print in the console.

The site I was using is yahoo finance - so I tried using another site, Sharenet, also the same problem. I then tried scraping another part of the website (the share name only) - also empty brackets. The only time I receive a result is when I scrape a "div" that has several items nested within it - in the printout there I can see the share price info but surely there is a way to get ONLY the price?

I have been using the following video on youtube as a guide which has been very helpful along with a previous post on here with a similiar problem but I still get problems.

https://www.youtube.com/watch?v=XQgXKtPSzUI

import yahoo finance stock price with beautifulsoup and request

Below is my code (I am using python 2.7):

import urllib2
from bs4 import BeautifulSoup as soup

#Opens the connection and downloads the webpage
kio_site = urllib2.urlopen("https://finance.yahoo.com/quote/KIO.JO?p=KIO.JO")

#This will print all the html on the webpage
kio_html = kio_site.read()
#Now closing the internet connection that you opened before
kio_site.close()

#now you want to parse the html file
page_soup = soup(kio_html, "html.parser")

#Specifically find certain elements
kio_info = page_soup.find_all("span", {"class":"Trsdu(0.3s) Fw(b) Fz(36px) Mb(-4px) D(ib)"})
print kio_info

When I instead use the following code, I get a result but the share price is within all the mess:

kio_info = page_soup.find_all("div", {"class":"My(6px) smartphone_Mt(15px)"})

Within the printout I also saw that there is a "data-reactid"="14" just before the share price number but even when I included this within my code (along with the "span" and "class" "Trsdu(0.3s)" etc.) it also did not give me the price.

Could it be that the way I am reading the webpage should not be in html? I tried using lxml but got an error.

Thank you in advance for any help!

Rip_027
  • 201
  • 2
  • 11

1 Answers1

0

I'd recommend using the requests library. But, that's not the problem here. The website is recognizing the Python script due to usage of default User-Agent and returning different response.

You can pass a fake User-Agent with the requests module to make the script look like a real browser.

You can use this:

import requests
from bs4 import BeautifulSoup

headers = {'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36'}
r = requests.get('https://finance.yahoo.com/quote/KIO.JO?p=KIO.JO', headers=headers)
soup = BeautifulSoup(r.text, 'html.parser')

print(soup.find_all("span", {"class": "Trsdu(0.3s) Fw(b) Fz(36px) Mb(-4px) D(ib)"}))

Output:

[<span class="Trsdu(0.3s) Fw(b) Fz(36px) Mb(-4px) D(ib)" data-reactid="35"><!-- react-text: 36 -->33,101.00<!-- /react-text --></span>]

Or, use this to get the value:

print(soup.find("span", {"class": "Trsdu(0.3s) Fw(b) Fz(36px) Mb(-4px) D(ib)"}).text)

Output:

33,101.00
Keyur Potdar
  • 7,158
  • 6
  • 25
  • 40
  • Thank you very much for the help - when I used the code with the ".text" at the end of the html it printed out exactly the share price value. I do have a small question regarding the logic in your answer - are you saying the problem was that the way my Chrome browser was reading the html on the site and sending it to Python was incorrect? How did you get the information within the "headers" variable? Another small question - how is ".find" different to ".find_all"? Thank you again! – Rip_027 Feb 23 '18 at 11:03
  • Read [this for your first question](https://stackoverflow.com/questions/27652543/how-to-use-python-requests-to-fake-a-browser-visit) and [this for the second](https://www.crummy.com/software/BeautifulSoup/bs4/doc/#find). – Keyur Potdar Feb 23 '18 at 15:01