0

So I've been working on a simple script that pulls stock symbols from a .txt file in the projects main directory and I just can't seem for it to bring back the pricing data. It works if I manually input them into a string array but when it comes to pulling from the file i just doesn't want to return the prices.

import urllib
import re

symbolfile = open("symbols.txt")
symbolslist = symbolfile.read()
newsymbolslist = symbolslist.split("\n")

i = 0

while i<len(newsymbollist):
    url = "http://finance.yahoo.com/q?uhb=uh3_finance_vert_gs_ctrl1&fr=&type=2button&s=" +symbolslist[i] +""
    htmlfile = urllib.urlopen(url)
    htmltext = htmlfile.read()
    regex = '<span id="yfs_184_' +newsymbolslist[i] +'">(.+?)</span>'
    pattern = re.compile(regex)
    price = re.findall(pattern,htmltext)
    print "The price of", newsymbolslist[i] ," is ", price
    i+=1

I could really use some help because it doesn't give any errors in the shell as to why.

Thanks in advance for any help!

AdmPicard
  • 439
  • 3
  • 14
Recypher
  • 33
  • 4
  • What is the question here? What is your current output? – mid Jan 10 '16 at 09:29
  • Could you provide a few lines of your _txt_ as well, the output you get and what you want to receive? – AdmPicard Jan 10 '16 at 09:35
  • Related: http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags – Łukasz Rogalski Jan 10 '16 at 09:45
  • 1
    There are a number of things you could be doing to improve this, but here it's actually a simple typo. The `span` ID prefix is `yfs_l84_` - note the letter "l", not the number "1" in "l84". – Linus Thiel Jan 10 '16 at 10:20
  • When it runs it will just print out "The price of AAPL is []" – Recypher Jan 10 '16 at 19:12
  • That's the only symbol in the file as of right now but i've tried with many others and it doesn't seem to work with any – Recypher Jan 10 '16 at 19:13

1 Answers1

0

By implementing the modification provided by @Linus Gustav Larsson Thiel in the comments and another one concerning the regex your code returns correct results. Please note the lowercase() in the regex, since the source includes lowercase symbols:

i = 0

while i < len(newsymbolslist):
    url = "http://finance.yahoo.com/q?uhb=uh3_finance_vert_gs_ctrl1&fr=&type=2button&s=" +newsymbolslist[i]
    htmlfile = urllib.urlopen(url)
    htmltext = htmlfile.read()
    regex = '<span id="yfs_l84_' +newsymbolslist[i].lower() +'">(.+?)</span>'
    pattern = re.compile(regex)
    price = pattern.findall(htmltext)
    print "The price of", newsymbolslist[i] ," is ", price
    i+=1

With a static list for testing purposes ['AAPL','GOOGL','MSFT'] I receive the following output:

The price of AAPL  is  ['98.53']
The price of GOOGL  is  ['733.07']
The price of MSFT  is  ['52.30']

If you want to, you might simplify your code as well:

baseurl = "http://finance.yahoo.com/q?uhb=uh3_finance_vert_gs_ctrl1&fr=&type=2button&s="

for symbol in newsymbolslist:
    url = baseurl + symbol
    source = urllib.urlopen(url).read()
    regex = re.compile('<span id="yfs_l84_' + symbol.lower() + '">(.+?)</span>')
    price = regex.findall(source)[0]
    print "The price of", symbol, "is", price

The for ... in ... loop eliminates the need for a counter variable and since findall() returns a list of matches (while you only expect one) you might attach [0] to display the containing string and not the list with a single element.

This will return the following:

The price of AAPL is 98.53
The price of GOOGL is 733.07
The price of MSFT is 52.30
AdmPicard
  • 439
  • 3
  • 14