-1

Used this tutorial to scrape stock prices: https://www.youtube.com/watch?v=f2h41uEi0xU

There are some similar questions, but I want to know how to fix this current code (for learning purposes) where these just have work arounds.

Web scraping information other than price from Yahoo Finance in Python 3

Using Regex to get multiple data on single line by scraping stocks from yahoo

I understand there are better ways to do this, however these videos are helpful to learn.

Everything is working, but it isn't retrieving the prices from the site! I have the exact code he has too. I am using Python Launcher (Mac) 2.7 (tried 3.4 as well) to run the python program.

Here's my code:

import urllib
import re

symbolslist = ["aapl", "spy", "goog", "nflx"]
i=0
while i<len(symbolslist):
    url = "http://finance.yahoo.com/q?s=" +symbolslist[i] +"&q1=1"
    htmlfile = urllib.urlopen(url)
    htmltext = htmlfile.read()
    regex = '<span id ="yfs_l84_'+symbolslist[i] +'">(.+?)</span>'
    pattern = re.compile(regex)
    price = re.findall(pattern,htmltext)
    print "the price of" , symbolslist[i], " is " ,price
    i+=1
Community
  • 1
  • 1

2 Answers2

2
  1. There is an extra space after id in your regex. The correct regex would be: (see the sample code below).

  2. price is a list, so to get the price, you need to use price[0].

Sample code:

>>> regex = '<span id="yfs_l84_"yfs_l84_'+symbolslist[i] +'"">(.+?)</span>'
>>> pattern = re.compile(regex)
>>> price = re.findall(pattern, htmltext)
>>> price
[u'568.77']
>>> price[0]
u'568.77'
Tamim Shahriar
  • 739
  • 4
  • 9
1

It is never a good idea to parse HTML using regular expression. I suggest using a parser like BeautifulSoup or lxml to parse for you. Also, another change I would make is not using a while loop. Use a for loop instead like I do. I see you have defined i and are incrementing it anyway, so a for loop makes more sense in this context.

But as for what is wrong with your regex expression, Tamim is right, you have an extra space in the id= part of your expression.

import urllib
from bs4 import BeautifulSoup

symbolslist = ["aapl", "spy", "goog", "nflx"]
for i in range(0, len(symbolslist)):
    url = "http://finance.yahoo.com/q?s=" +symbolslist[i] +"&q1=1"
    htmlfile = urllib.urlopen(url)
    htmltext = htmlfile.read()
    bs = BeautifulSoup(htmltext)
    idTag = 'yfs_l84_' + symbolslist[i]
    price = bs.find('span', {'id': idTag}).text
    print "the price of" , symbolslist[i], " is " ,price
Community
  • 1
  • 1
heinst
  • 8,520
  • 7
  • 41
  • 77