How to scrape stock price information from given HTML?

Question

I'm trying to retrive stock information from yahoo finance. I have figured out how to use re.findall to get the prices into a list. If the stock symbol/price does not exist, I have found a way to retrive it saying ['No such ticker symbol']. My issue is I need to have the prices and No such ticket symbol found in the same list in order. This is my code so far. Is it possible to have two patterns in findall() so it can put them both into one list??

import urllib.request
import re

li = [i.strip().split() for i in open("Portfolio.txt").readlines()]
li[0:26] =[]
li = [x for x in li if x]
li.sort()


def retrieve_page(url):
    my_socket = urllib.request.urlopen(url)
    dta = str(my_socket.readall())
    my_socket.close()
    price = re.findall((r'<td class="col-price cell-raw:(.*?)"><span'), dta)
    noprice = re.findall(r'<span class ="no-symbol">(.*?):<strong>', dta)
    print(price)
    print(noprice)

retrieve_page("http://finance.yahoo.com/quotes/AAPL,GOOG,HWP,IBM,MSFT")

My output is as follows

['107.120003', '552.25', '164.478699', '46.0938']
['No such ticker symbol']

You could probably chain the two expressions together with a `|`. — anon582847382, Oct 30 '14 at 18:15
Like this? Because I get an error when I do it price = re.findall((r'(.*?): ') , dta) — Sam Kluender, Oct 30 '14 at 18:22
Is that precisely your actual program, copy-pasted into place? Or is it almost exactly your program, maybe re-typed? Because when I run **that** program, I don't get the output you describe. — Robᵩ, Oct 30 '14 at 18:34

score 3 · Answer 1 · edited May 23 '17 at 12:17

3

If it were me, I'd avoid parsing HTML with a regular expression and use BeautifulSoup instead:

import requests
from bs4 import BeautifulSoup

def retrieve_page(url):
    dta = requests.get(url).text
    soup = BeautifulSoup(dta)
    price = soup.find_all(class_=["col-price", "invalid-symbol"])
    price = [next(x.strings) for x in price]
    # fix up ': '
    price = [x.replace(': ','') for x in price]
    print(price)

retrieve_page("http://finance.yahoo.com/quotes/AAPL,GOOG,HWP,IBM,MSFT")

Result:

['106.54', '547.45', 'No such ticker symbol', '163.86', '45.86']

edited May 23 '17 at 12:17

Community

1
1

answered Oct 30 '14 at 19:04

Robᵩ

163,533
20
239
308

OP is using python 3 `next(x.strings) ` – Padraic Cunningham Oct 30 '14 at 19:16

How to scrape stock price information from given HTML?

1 Answers1