I've been scraping bloomberg for currency prices using urllib2 and straight forward text functions to read the part of the hmtl where the price is stored. Probably won't win any prizes for efficiency, but it has been suitable for my purposes. This is an extract from the code where the page is scraped.
#grab the html source as a big string
response = urllib2.urlopen('https://www.bloomberg.com/quote/CHFGBP:CUR')
page = response.read()
#locate the html where price is stored
pricestart = page.find('meta itemprop="price" content=')+len('meta itemprop="price" content=')
#plus twenty characters
price = page[pricestart:pricestart+20]
#find the data between double quotes in that substring
pricesplit = price.split('"')
#the 1st element of the output array is the price, cast it as a float
priceAsFloat = float(pricesplit[1])
#and save it to the current prices dictionary
pricesCurr[keys] = priceAsFloat
I'd like to do the same thing for Yahoo Finance as it's a lot more frequent in its updates and gives the feeling of 'live' prices (I know they're delayed for 15 minutes).
However, my method that works on the bloomberg html doesn't work for the yahoo source
Looking at this url, for example https://uk.finance.yahoo.com/quote/CHFJPY=X?p=GBPJPY=X
Inspecting the html returned by urllib2.urlopen - the current price isn't there in the text to scrape. Or at least I can't find it!
Can anyone offer any advice as to how to go about scraping the yahoo finance html?