0

I'm new to python so I apologize for any rookie mistakes. I followed a tutorial to scrape stock prices from python but after fixing it to work in python 3 when I tried to adapt it to other elements of the Yahoo Finance page such as P/E ratios and Beta the output was just empty square brackets.

import urllib.request
import re

symbolslist = ["aapl","spy","goog","nflx"]

i=0
while i<len(symbolslist):
    url = "http://finance.yahoo.com/q?s=" +symbolslist[i] +"&q1=1"
    htmlfile = urllib.request.urlopen(url)
    htmltext = htmlfile.read()
    regex = b'<th scope="row" width="48%">"P/E "<span class="small">(ttm)</span>:    </th><td class="yfnc_tabledata1">(.+?)</td>'
    pattern = re.compile(regex)
    price_to_earnings = str(re.findall(pattern,htmltext))
    print ("The price to earnings of " + symbolslist[i]+ " is " + price_to_earnings)
    i+=1

this was the output

    The price to earnings of aapl is []
    The price to earnings of spy is []
    The price to earnings of goog is []
    The price to earnings of nflx is []
    >>> 
  • Post your current code. – Blender Sep 02 '13 at 21:42
  • When I was a beginner to programming I used regex to scrape. It worked. After half a year of learning I become more comfortable and was able to move to beautifulsoup; beautifulsoup is vastly superior for scraping. – appleLover Sep 03 '13 at 01:00

3 Answers3

0

First I would suggest you to use BeautifulSoup instead of regex. And hope this example will help you finish your problem, even though it's python2.7:

>>> import urllib2
>>> from bs4 import BeautifulSoup as bs4
>>> html_file = urllib2.urlopen("http://finance.yahoo.com/q?s=goog&q1=1")
>>> soup = bs4(html_file)
>>> for price in soup.find(attrs={'id':"yfs_l84_goog"}):
...     print price
... 
846.90
>>> 
Vor
  • 33,215
  • 43
  • 135
  • 193
  • Thanks. Will look into beautiful soup. With regex though, I think one of the problems might be the code I get from Yahoo (inspect element) in the first place, the code for price works but for P/E or Beta the code is a completely different format featuring undefined classes. – user2741092 Sep 03 '13 at 18:35
0

Use Yahoo Finance's CSV format rather than HTML, then use CsvReader to parse the results.

For details on the CSV format, see here. However, Yahoo Finance URL has changed since that document was written. Use http://download.finance.yahoo.com instead of http://finance.yahoo.com.

Jimothy
  • 9,150
  • 5
  • 30
  • 33
0

I am having the same problem and when I go to http://download.finance.yahoo.com, I get redirected to http://finance.yahoo.com, and it appears the CSV format link got shut down by Yahoo.

The issues seems to be that the url is too long and convoluted. Which maybe they did so we couldn't keep scrapping their data in this way. Is there a different way to go about this? I try scrapping from finance.msn.com as well, but ran into the same issue where the url is too convoluted and long.

Perhaps I just need to look for another finance site that is less known. I'll see what I can find.

byoungdale
  • 161
  • 2
  • 13