0

I'm trying to retrive stock information from yahoo finance. I have figured out how to use re.findall to get the prices into a list. If the stock symbol/price does not exist, I have found a way to retrive it saying ['No such ticker symbol']. My issue is I need to have the prices and No such ticket symbol found in the same list in order. This is my code so far. Is it possible to have two patterns in findall() so it can put them both into one list??

import urllib.request
import re

li = [i.strip().split() for i in open("Portfolio.txt").readlines()]
li[0:26] =[]
li = [x for x in li if x]
li.sort()


def retrieve_page(url):
    my_socket = urllib.request.urlopen(url)
    dta = str(my_socket.readall())
    my_socket.close()
    price = re.findall((r'<td class="col-price cell-raw:(.*?)"><span'), dta)
    noprice = re.findall(r'<span class ="no-symbol">(.*?):<strong>', dta)
    print(price)
    print(noprice)

retrieve_page("http://finance.yahoo.com/quotes/AAPL,GOOG,HWP,IBM,MSFT")

My output is as follows

['107.120003', '552.25', '164.478699', '46.0938']
['No such ticker symbol']
smci
  • 32,567
  • 20
  • 113
  • 146

1 Answers1

3

If it were me, I'd avoid parsing HTML with a regular expression and use BeautifulSoup instead:

import requests
from bs4 import BeautifulSoup

def retrieve_page(url):
    dta = requests.get(url).text
    soup = BeautifulSoup(dta)
    price = soup.find_all(class_=["col-price", "invalid-symbol"])
    price = [next(x.strings) for x in price]
    # fix up ': '
    price = [x.replace(': ','') for x in price]
    print(price)

retrieve_page("http://finance.yahoo.com/quotes/AAPL,GOOG,HWP,IBM,MSFT")

Result:

['106.54', '547.45', 'No such ticker symbol', '163.86', '45.86']
Community
  • 1
  • 1
Robᵩ
  • 163,533
  • 20
  • 239
  • 308