0
#http://finance.yahoo.com/q?s=spy

import urllib.request
import re


htmlfile = urllib.request.urlopen("http://finance.yahoo.com/q?s=spy")
htmltext = htmlfile.read().decode("utf-8")

regex = re.compile('<span id="yfs_184_spy">(.+?)</span>')
regex1 = re.compile('<span id="yfs_l84_spy">(.+?)</span>')
regex2 = re.compile('<span id="yfs_184_spy">(.+?)</span>')

price = re.findall(regex, htmltext)
price2 = re.findall(regex1, htmltext)
price3 = re.findall(regex2, htmltext)
price4 = re.findall(regex, htmltext)

print(price)
print(price2)
print(price3)
print(price4)

the code above returns this result:

[]
['197.55']
[]
[]

I have no idea why the other regex variables do not return any match objects (price, price3, price 4). Price2 variable html regex pattern was copied from the source of the URL and pasted into the editor which worked. When I type out the HTML for some reason it won't return a match object. Thanks so much for any help in advanced.

acda
  • 3
  • 2
  • You should take a look at [this](http://stackoverflow.com/questions/1732348) – Chaker Aug 31 '15 at 19:47
  • I would not recommend using regex for parsing html. I know that's not really the issue here, but there are much better tools for the job (BeautifulSoup, HTMLParser, etc) – Chad S. Aug 31 '15 at 19:49
  • 2
    It looks like you typed a lowercase L in regex1 and a number one in the others. – BrenBarn Aug 31 '15 at 19:50
  • thank you so much for the help guys. i really appreciate it – acda Aug 31 '15 at 20:07

1 Answers1

0

The reason only once of your regexs is finding the string is because that's the only one that matches. In the first and third, the string is "yfs_184_spy" - the center 3 characters are "one eight four". In the second, the string is "yfs_l84_spy" - that's "el eight four". The code font is masking that in the question. Get your string right and you'll have better results.

Erik Johnson
  • 1,136
  • 6
  • 17
  • thanks so much. i really appreciate you taking the time out to help a noob, like myself. – acda Aug 31 '15 at 20:07