I am searching an html formatted site for a string ("s") which has this format:
<td class="number">$0.48</td>
I am trying to return the "$0.48" by using a regex. It was working until today and I have no idea what changed, but here is my snippet of code:
def scrubdividata(ticker):
sleep(1.0) # Time in seconds.
f = urllib2.urlopen('the url')
lines = f.readlines()
for i in range(0,len(lines)):
line = lines[i]
if "Annual Dividend:" in line:
print 'for ticker %s, annual dividend is in line'%(ticker)
s = str(lines[i+1])
print s
start = '>$'
end = '</td>'
AnnualDiv = re.search('%s(.*)%s' % (start, end), s).group(1)
Here is the result:
for ticker A, annual dividend is in line
<td class="number">$0.48</td>
Traceback (most recent call last):
File "test.py", line 115, in <module>
scrubdividata(ticker)
File "test.py", line 34, in scrubdividata
LastDiv = re.search('%s(.*)%s' % (start, end), s).group(1)
AttributeError: 'NoneType' object has no attribute 'group'
I am using python 2.5 (I believe). I have heard never to use regex with html, but I needed to quickly use my limited knowledge to get the job done asap and regex is the only way I know of. Now, am I suffering the consequences or is there another issue that's causing this? Any insights would be wonderful!
Thanks, B