1

I've been scraping bloomberg for currency prices using urllib2 and straight forward text functions to read the part of the hmtl where the price is stored. Probably won't win any prizes for efficiency, but it has been suitable for my purposes. This is an extract from the code where the page is scraped.

    #grab the html source as a big string

    response = urllib2.urlopen('https://www.bloomberg.com/quote/CHFGBP:CUR')

    page = response.read()

    #locate the html where price is stored

    pricestart = page.find('meta itemprop="price" content=')+len('meta itemprop="price" content=')

    #plus twenty characters

    price = page[pricestart:pricestart+20] 

    #find the data between double quotes in that substring

    pricesplit = price.split('"')

    #the 1st element of the output array is the price, cast it as a float

    priceAsFloat = float(pricesplit[1])

    #and save it to the current prices dictionary

    pricesCurr[keys] = priceAsFloat

I'd like to do the same thing for Yahoo Finance as it's a lot more frequent in its updates and gives the feeling of 'live' prices (I know they're delayed for 15 minutes).

However, my method that works on the bloomberg html doesn't work for the yahoo source

Looking at this url, for example https://uk.finance.yahoo.com/quote/CHFJPY=X?p=GBPJPY=X

Inspecting the html returned by urllib2.urlopen - the current price isn't there in the text to scrape. Or at least I can't find it!

Can anyone offer any advice as to how to go about scraping the yahoo finance html?

user4190374
  • 49
  • 3
  • 8
  • Yahoo discourages webscraping the finance data ( https://stackoverflow.com/questions/38355075/has-yahoo-finance-web-service-disappeared-api-changed-down-temporarily )and instead prefers for you to go to https://developer.yahoo.com/ where you can use APIs and their YQL (Yahoo Query Language) While this does not answer your question re: scraping, this may be an alternative to getting your desired outcome. – Rookie Oct 12 '17 at 20:44
  • That's interesting! About a year ago I switched to scraping bloomberg because another script I had getting the spot price on gold from via the API and YQL ceased to work and I read a load threads saying the API had been discontinued. – user4190374 Oct 13 '17 at 08:31
  • I should have checked for myself - seems like it's still up and running. https://developer.yahoo.com/yql/console/?q=select%20Symbol%2C%20Close%20from%20yahoo.finance.historicaldata%20where%20symbol%20in%20(%22IBM%22%2C%20%22GOOG%22)%20and%20startDate%20%3D%20%222014-01-01%22%20and%20endDate%20%3D%20%222014-10-27%22&env=store%3A%2F%2Fdatatables.org%2Falltableswithkeys#h=select+Ask+from+yahoo.finance.quotes+where+symbol+in+(%22CHFUSD%3DX%22) – user4190374 Oct 13 '17 at 08:44

1 Answers1

1

I have also been working with Yahoo finance data. The value you're looking for is there, but it is buried. The following is an excerpt of code I have been using to scrape Yahoo finance:

from bs4 import BeautifulSoup
import urllib3 as url
import certifi as cert


def get_stock_price(name):
    http = url.PoolManager(cert_reqs='CERT_REQUIRED', ca_certs=cert.where())
    html_doc = http.request('GET', 'https://finance.yahoo.com/quote/' + name + '?p=' + name)
    soup = BeautifulSoup(html_doc.data, 'html.parser')
    return soup.find("span", class_="Trsdu(0.3s) Fw(b) Fz(36px) Mb(-4px) D(ib)").get_text()

Where name is the shorthand name of the stock (e.g. 'tsla'). To find the appropriate value to scrape, I manually drilled down through the html until i found the section which highlighted the value I was searching for. The code above works with the site you provided.

  • I noticed today the finance API was down (returning error 999 request denied) so I thought I'd try your method as a backup. Your code works great but the number doesn't seem to react to changes like the live yahoo page does. Any ideas why? For example I was looking at the pound/us dollar relationship https://finance.yahoo.com/quote/GBPUSD=X/ and the return from beautiful soup is always 1.31, whereas the yahoo webpage is jumping between 1.3122, 1.3121 and so on. Any ideas? – user4190374 Nov 02 '17 at 14:33
  • 1
    for anyone else reading...the answer is to load the page through the browser so it's rendered...I used selenium – user4190374 Nov 02 '17 at 20:14