Unable to print the data retrieved from regex(strictly only this) module in python?

Question

Here iam using 're' module in python to crawl a web page and there are 4 iterations and after each iteration it is returning with empty array like this [''] but the output should be the stock price of desired stock symbol.There is no error in regex variable as it is printing correctly.The source code is included below.

import urllib
import re

symbolslist = ["appl","spy","goog","nflx"]

i=0
while i<len(symbolslist):
        url ="http://in.finance.yahoo.com/q?s=" +symbolslist[i] +"&ql=1"
        htmlfile = urllib.urlopen(url)
        htmltext = htmlfile.read()
        regex ='<span id="yfs_l84_'+symbolslist[i] +'">(.+?)</span>'
        pattern = re.compile(regex)
        print regex
        price = re.findall(pattern,htmltext)
        print "price of ",symbolslist[i],"is",price
        i+=1

And in the output there is no syntax or indentation error and output looks like this

<span id="yfs_l84_appl">(.+?)</span>
price of  appl is []
<span id="yfs_l84_spy">(.+?)</span>
price of  spy is []
<span id="yfs_l84_goog">(.+?)</span>
price of  goog is []
<span id="yfs_l84_nflx">(.+?)</span>
price of  nflx is []

In the array the value of the stock is not printing

Web Page crawled is https://in.finance.yahoo.com/q?s=NFLX&ql=0

Print the html text variable before running regex on it. I would not be surprised, if the actual stock price is updated using AJAX and it is not in the HTML you receive — jpou, Feb 14 '16 at 13:02
@jpou the same program is working for a single stock symbol and is failing if we run a array on the same — SaiKiran, Feb 14 '16 at 13:04

Martin Evans · Accepted Answer · 2016-02-14T12:57:49.203

1

As an alternative approach, you might find it easier to use the yahoo_finance library as follows:

from yahoo_finance import Share

for symbol in ["appl", "spy", "goog", "nflx"]:
    yahoo = Share(symbol)
    print 'Price of {} is {}'.format(symbol, yahoo.get_price())

Giving you the following output:

Price of appl is 96.11
Price of spy is 186.63
Price of goog is 682.40
Price of nflx is 87.40

It is never a wise move to try and parse HTML data using regular expressions.

Another approach would be to extract the information first using BeautifulSoup:

from bs4 import BeautifulSoup
import requests
import re

for symbol in ["appl", "spy", "goog", "nflx"]:
    url = 'http://finance.yahoo.com/q?s={}'.format(symbol)
    r = requests.get(url)
    soup = BeautifulSoup(r.text, "html.parser")

    data = soup.find('span', attrs= {'id' : re.compile(r'yfs_.*?_{}'.format(symbol.lower()))})
    print 'Price of {} is {}'.format(symbol, data.text)

edited Feb 14 '16 at 12:57

answered Feb 14 '16 at 12:55

Martin Evans

45,791
17
81
97

But my approach is to use regex module only for now – SaiKiran Feb 14 '16 at 12:57
1

@SaiKiranUppu: [***N̵̻̹̪͉͔͍̞̲̹͑͛́̆̈́̆̈͌͗͊ͅḘ̣̪̣̺͓̟͛̅́̇͗̂͑͋̒͛ͅV̵̢̢̫͇͍̩̳͖̼̌̑̒͋͠E̷̘̱̦͓̝̝̫͎̼͛͗͋̉͛̾̚͟Ṙ̘͕̭̲̠͛͌̐̀͞ U̸̢̲̭̜͈̰̇́̽̾͗̉̕͝ͅS͙̥̙͖̏̆́͗̀̉͜͞E̛̺̜̪̠͕̅̏̏͑̆̄̕̕̕ Ŗ̥̼̘̯͉̻̻̏̓͒́͛̔͜͡Ȩ̵̦̙̰̼̘̰̲͎͗͗͛̉͐̎̕͡G̴̡͔͓͚̺̗̟̟͈̎͛͂̽̄ͅÈ̷̡͈͎̣̫̟̊͋̇́̕X̨̨̝̣̋̀͆̽̅̽̄͟ Ṗ̛̳͈͚̞̤̹̭̦̣̽̀̈Ą̦̫͎̪̠͉̫͋̐̉̏͌̓̔̓̉Ŗ̼̮̥͕͎͈̦̔̌͑̊̿S̸̢̮̳̱̗̗͖̋̿̑̊̇̄̀E̴̙̱̳͉̍̽͗̆̐̆͢͟͝͝ H̦̝͕͔̙̙̳͍̻̋̑̈̇̐̾̀͝͡͠ͅŢ̵̧͎̭̣̪͎̞̳̄̀͒̒̋̚͝M̸̢̻̱̖͈͕͊̌͌̋̚͟L̷͕̣̻̗̞̐̈́̓̐̀̈́̾͜͜͠͝***](https://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags/1732454#1732454) – Remi Guan Feb 14 '16 at 13:15
Click on the link, it will take you to the website where you can download and install the library. – Martin Evans Feb 14 '16 at 13:27

Unable to print the data retrieved from regex(strictly only this) module in python?

1 Answers1