1

I am trying to web scrape data from Google Finance, specifically for stock quotes. I am using the answer found here: How to create a stock quote fetching app in python , and its working fine, but only for google. I am new to regex, and noticed what needs to be changed, but not sure how to do it.

The below code works for parsing the data for the google quote to get the current price.

m = re.search('id="ref_694653_l".*?>(.*?)<', content)

the 694653 is specific to google though. If I do Zynga, ZNGA, it should be looking for:

<span id="ref_481720736332929_l">3.57</span>

I want to have a regular expression that searches for

id="ref_SOME_NUMBER_l">SOME_PRICE"

Any help would be greatly appreciated!

Community
  • 1
  • 1
IamAdamCooke
  • 11
  • 1
  • 6
  • 6
    Try [BeautifulSoup](http://www.crummy.com/software/BeautifulSoup/) instead. It’s much easier to use BeautifulSoup for extraction of information from HTML than it is to craft a complicated regular expression that may or may not work in every case. – Ry- Mar 11 '13 at 02:01

3 Answers3

2

Scraping HTML from another site is rarely the best solution. APIs were built for a reason. Check out https://stackoverflow.com/a/10040996/254973 if you're wanting machine readable financial data.

If you are insistant on scraping the HTML, use a library like @minitech mentioned. You should never try to parse HTML with Regex. read more here

Community
  • 1
  • 1
Steven V
  • 16,357
  • 3
  • 63
  • 76
0

Just do it the right way:

import urllib2, re

from bs4 import BeautifulSoup

def get_quote(symbol):
    url = 'http://finance.google.com/finance?q=' + symbol
    soup = BeautifulSoup(urllib2.urlopen(url))

    return float(soup.find('span', id=re.compile(r'ref_\d+_l')).get_text())

Regex is not really the answer if you can parse the HTML and do it just as easily.

Blender
  • 289,723
  • 53
  • 439
  • 496
0
match = re.search('<span (id="ref_\d*_l">\d*\.?\d*)</span>', content)
print match.group(1)
waitingkuo
  • 89,478
  • 28
  • 112
  • 118