import urllib
import re
stocks_symbols = ['aapl', 'spy', 'goog', 'nflx', 'msft']
for i in range(len(stocks_symbols)):
htmlfile = urllib.urlopen("https://finance.yahoo.com/q?s=" + stocks_symbols[i])
htmltext = htmlfile.read(htmlfile)
regex = '<span id="yfs_l84_' + stocks_symbols[i] + '">(.+?)</span>'
pattern = re.compile(regex)
price = re.findall(pattern, htmltext)
regex1 = '<h2 id="yui_3_9_1_9_(.^?))">(.+?)</h2>'
pattern1 = re.compile(regex1)
name1 = re.findall(pattern1, htmltext)
print "Price of", stocks_symbols[i].upper(), name1, "is", price[0]
I guess the problem is in regex1
,
regex1 = '<h2 id="yui_3_9_1_9_(.^?))">(.+?)</h2>'
I tried reading documentation but was unable to figure it out.
In this program I trying to scrape Stock-Name and Stock-Price with input of Stock-Symbol as a list.
what I think I am doing is to passing 2 (.+?) in one variable which seems incorrect.
OutPut:
Traceback (most recent call last):
File "C:\Py\stock\stocks.py", line 14, in <module>
pattern1 = re.compile(regex1)
File "C:\canopy-1.4.0.1938.win-x86\lib\re.py", line 190, in compile
return _compile(pattern, flags)
File "C:\canopy-1.4.0.1938.win-x86\lib\re.py", line 242, in _compile
raise error, v # invalid expression
error: nothing to repeat
` with `id="yui_3_9_1_9_` - there are only `
– furas Jul 05 '14 at 15:28(.+?)
-NasdaqGS