0

I have been trying to port a script which will request fundamental data from Yahoo Finance site, but I would like to look for specific items instead of the entire reports, like price to book ratios, for example. So, I have followed a tutorial from Sentdex on how to do that. The problem is that the example code is written for Python 2.7 and I am trying to make that work for Python 3, and of course expand on it by adding more features.

Here is how it is looking so far:

import time
import urllib
import urllib.request


sp500short = ['a', 'aa', 'aapl', 'abbv', 'abc', 'abt', 'ace', 'aci', 'acn', 'act', 'adbe', 'adi', 'adm', 'adp']


def yahooKeyStats(stock):

    try:
        sourceCode = urllib.request.urlopen('http://finance.yahoo.com/q/ks?s='+stock).read()
        pbr = sourceCode.split('Price/Book (mrq):</td><td class="yfnc_tabledata1">')[1].split('</td>')[0]       
        print ('price to book ratio:'),stock,pbr

    except Exception as e:
        print ('failed in the main loop'),str(e)


for eachStock in sp500short:
    yahooKeyStats(eachStock)
    time.sleep(1)

I'm almost sure the problem is on the pbr variable definition, on the splitting part of it. The:

 Price/Book (mrq):</td><td class="yfnc_tabledata1">

And...:

</td>

...are just sort of delimiters as what I'm looking for, the actual value, is in between those two items listed above.But, so far it is only giving me the exception message when executing it.

Any help will be much appreciated. Cheers,

dude
  • 171
  • 2
  • 7
  • Use an HTML parser such as BeautifulSoup. – Alex Hall May 21 '16 at 14:04
  • 1
    There are many ways to do what you want without scraping a webpage... Related: http://stackoverflow.com/questions/12433076/download-history-stock-prices-automatically-from-yahoo-finance-in-python – OneCricketeer May 21 '16 at 14:05

1 Answers1

1

It looks like urllib.request.urlopen and .read() is returning data with type bytes.

From the python docs:

Note that urlopen returns a bytes object. This is because there is no way for urlopen to automatically determine the encoding of the byte stream it receives from the http server. In general, a program will decode the returned bytes object to string once it determines or guesses the appropriate encoding.

The split method is failing here. Try appending .decode() after .read(). The issue is that you are trying to split the sourceCode variable which is of type bytes by a string. Decoding sourceCode will convert it from bytes to string. Alternatively, you could .encode() both of your delimiters.

bytes.decode

Matt O'Connell
  • 287
  • 3
  • 14
  • Good point!! I did it, and it seems the problem code block is now being processed, but still partially as I am receiving the 'price to book ratio' from the script, but not the value itself: sourceCode = urllib.request.urlopen('http://finance.yahoo.com/q/ks?s='+stock).read().decode() – dude May 21 '16 at 14:26
  • You're syntax for printing a string seems to be off here in the example. Should be: `print('price to book ratio:',stock,pbr)` – Matt O'Connell May 21 '16 at 14:28
  • Thanks buddy!! That did it! :D – dude May 21 '16 at 14:31
  • Glad to help. Definitely check out the link provided by @cricket_007. Scraping web pages can be tricky for several reasons. One of the main ones would be that any time the author decides to change the format of the page (maybe he adds another class like: ``,) your scraper will break. Moving forward, you may want to look into Yahoo's finance API – Matt O'Connell May 21 '16 at 14:35
  • Absolutely. The only reason I have decided to do it like that is because I have found some very good API's for financial data, but they provide, at least the ones I could find , technical data for analyses and I would like to in the future also retrieve both kinds, technical and fundamental data to sort of have that automated. But I am just beginning all that and have a long way ahead of me... :) – dude May 21 '16 at 14:54