Web scraping information other than price from Yahoo Finance in Python 3

Question

I'm new to python so I apologize for any rookie mistakes. I followed a tutorial to scrape stock prices from python but after fixing it to work in python 3 when I tried to adapt it to other elements of the Yahoo Finance page such as P/E ratios and Beta the output was just empty square brackets.

import urllib.request
import re

symbolslist = ["aapl","spy","goog","nflx"]

i=0
while i<len(symbolslist):
    url = "http://finance.yahoo.com/q?s=" +symbolslist[i] +"&q1=1"
    htmlfile = urllib.request.urlopen(url)
    htmltext = htmlfile.read()
    regex = b'<th scope="row" width="48%">"P/E "<span class="small">(ttm)</span>:    </th><td class="yfnc_tabledata1">(.+?)</td>'
    pattern = re.compile(regex)
    price_to_earnings = str(re.findall(pattern,htmltext))
    print ("The price to earnings of " + symbolslist[i]+ " is " + price_to_earnings)
    i+=1

this was the output

    The price to earnings of aapl is []
    The price to earnings of spy is []
    The price to earnings of goog is []
    The price to earnings of nflx is []
    >>>

When I was a beginner to programming I used regex to scrape. It worked. After half a year of learning I become more comfortable and was able to move to beautifulsoup; beautifulsoup is vastly superior for scraping. — appleLover, Sep 03 '13 at 01:00

score 0 · Answer 1 · answered Sep 02 '13 at 23:18

0

First I would suggest you to use BeautifulSoup instead of regex. And hope this example will help you finish your problem, even though it's python2.7:

>>> import urllib2
>>> from bs4 import BeautifulSoup as bs4
>>> html_file = urllib2.urlopen("http://finance.yahoo.com/q?s=goog&q1=1")
>>> soup = bs4(html_file)
>>> for price in soup.find(attrs={'id':"yfs_l84_goog"}):
...     print price
... 
846.90
>>>

answered Sep 02 '13 at 23:18

Vor

33,215
43
135
193

Thanks. Will look into beautiful soup. With regex though, I think one of the problems might be the code I get from Yahoo (inspect element) in the first place, the code for price works but for P/E or Beta the code is a completely different format featuring undefined classes. – user2741092 Sep 03 '13 at 18:35

score 0 · Answer 2 · answered Jan 02 '14 at 15:57

Use Yahoo Finance's CSV format rather than HTML, then use CsvReader to parse the results.

For details on the CSV format, see here. However, Yahoo Finance URL has changed since that document was written. Use http://download.finance.yahoo.com instead of http://finance.yahoo.com.

score 0 · Answer 3 · answered Sep 11 '14 at 15:29

I am having the same problem and when I go to http://download.finance.yahoo.com, I get redirected to http://finance.yahoo.com, and it appears the CSV format link got shut down by Yahoo.

The issues seems to be that the url is too long and convoluted. Which maybe they did so we couldn't keep scrapping their data in this way. Is there a different way to go about this? I try scrapping from finance.msn.com as well, but ran into the same issue where the url is too convoluted and long.

Perhaps I just need to look for another finance site that is less known. I'll see what I can find.

Web scraping information other than price from Yahoo Finance in Python 3

3 Answers3

Linked