Stock Prices Not Scraping With Python

Question

Used this tutorial to scrape stock prices: https://www.youtube.com/watch?v=f2h41uEi0xU

There are some similar questions, but I want to know how to fix this current code (for learning purposes) where these just have work arounds.

Web scraping information other than price from Yahoo Finance in Python 3

Using Regex to get multiple data on single line by scraping stocks from yahoo

I understand there are better ways to do this, however these videos are helpful to learn.

Everything is working, but it isn't retrieving the prices from the site! I have the exact code he has too. I am using Python Launcher (Mac) 2.7 (tried 3.4 as well) to run the python program.

Here's my code:

import urllib
import re

symbolslist = ["aapl", "spy", "goog", "nflx"]
i=0
while i<len(symbolslist):
    url = "http://finance.yahoo.com/q?s=" +symbolslist[i] +"&q1=1"
    htmlfile = urllib.urlopen(url)
    htmltext = htmlfile.read()
    regex = '<span id ="yfs_l84_'+symbolslist[i] +'">(.+?)</span>'
    pattern = re.compile(regex)
    price = re.findall(pattern,htmltext)
    print "the price of" , symbolslist[i], " is " ,price
    i+=1

If you are going to learn you may as well learn the correct way, parsing html with regex is not a good idea — Padraic Cunningham, Aug 09 '14 at 18:31
You can also iterate over the elements of symbolslist directly, using range is redundant. `for i in symbolslist`, then just use i instead of `symbolslist[i]` — Padraic Cunningham, Aug 09 '14 at 18:38

Tamim Shahriar · Answer 1 · 2014-08-09T18:31:19.367

2

There is an extra space after id in your regex. The correct regex would be: (see the sample code below).
price is a list, so to get the price, you need to use price[0].

Sample code:

>>> regex = '<span id="yfs_l84_"yfs_l84_'+symbolslist[i] +'"">(.+?)</span>'
>>> pattern = re.compile(regex)
>>> price = re.findall(pattern, htmltext)
>>> price
[u'568.77']
>>> price[0]
u'568.77'

edited Aug 09 '14 at 18:31

answered Aug 09 '14 at 18:25

Tamim Shahriar

739
4
9

1

the space is the problem not using `symbolslist[i]` – Padraic Cunningham Aug 09 '14 at 18:30
Yes, space is the problem and also price[0] should be used. I used goog instead of symbolslist[i] to check quickly if it works. – Tamim Shahriar Aug 09 '14 at 18:32
well probably better to use the code as the OP has to avoid confusion – Padraic Cunningham Aug 09 '14 at 18:33

score 1 · Answer 2 · edited May 23 '17 at 11:57

It is never a good idea to parse HTML using regular expression. I suggest using a parser like BeautifulSoup or lxml to parse for you. Also, another change I would make is not using a while loop. Use a for loop instead like I do. I see you have defined i and are incrementing it anyway, so a for loop makes more sense in this context.

But as for what is wrong with your regex expression, Tamim is right, you have an extra space in the id= part of your expression.

import urllib
from bs4 import BeautifulSoup

symbolslist = ["aapl", "spy", "goog", "nflx"]
for i in range(0, len(symbolslist)):
    url = "http://finance.yahoo.com/q?s=" +symbolslist[i] +"&q1=1"
    htmlfile = urllib.urlopen(url)
    htmltext = htmlfile.read()
    bs = BeautifulSoup(htmltext)
    idTag = 'yfs_l84_' + symbolslist[i]
    price = bs.find('span', {'id': idTag}).text
    print "the price of" , symbolslist[i], " is " ,price

Stock Prices Not Scraping With Python

2 Answers2