0

This program is a very simple example of webscraping. The programs goal is to go on the internet, find a specific stock, and then tell the user the price that the stock is currently trading at. However, I run into the issue in the code that when I compile it, this error message comes up:

 Traceback (most recent call last):
 File "HTMLwebscrape.py", line 15, in <module>
   price = re.findall(pattern,htmltext)
  File "C:\Python34\lib\re.py", line 210, in findall
    return _compile(pattern, flags).findall(string)
TypeError: can't use a string pattern on a bytes-like object

Below is the actual script of the program. I've tried finding ways to solve this code, but so far, I've been unable to. I've been running this on Python 3 and using Submlime Text as my text editor. Thank you in advance!

import urllib
import re 
from urllib.request import Request, urlopen
from urllib.error import  URLError

symbolslist = ["AAPL","SPY","GOOG","NFLX"]

i=0

while i < len(symbolslist):
    url = Request("http://finance.yahoo.com/q?s=AAPL&ql=1")
    htmlfile = urlopen(url)
    htmltext = htmlfile.read()
    regex  = '<span id="yfs_184_'+symbolslist[i] + '"">(.+?)</span>'
    pattern = re.compile(regex)
    price = re.findall(pattern,htmltext)
    print (price)
    i+=1 
  • 3
    Convert htmltext to a string. It's currently a bytes string – Programmer Oct 07 '15 at 20:05
  • When I convert it to htmltext to a string the output is simply [],[],[],[]. It doesn't actually give the value of the stock. Would you happen to know why this happens? – Keshav Sota Oct 07 '15 at 20:09
  • And of course, the way you do that is to `decode` it using an appropriate codec. e.g. `htmltext = htmlfile.read().decode('utf-8')` – mgilson Oct 07 '15 at 20:09
  • @KeshavSota -- There could be lots of reasons. e.g. your [regex could be wrong](http://stackoverflow.com/a/1732454/748858) ;-) or the site could be detecting that you're trying to scrape it programatically and giving you a response that you don't expect. – mgilson Oct 07 '15 at 20:11
  • you can try `re.findall(pattern,htmltext.decode("UTF-8"))` – Jose Ricardo Bustos M. Oct 07 '15 at 20:40
  • When I do this, all I get is: [] [] [] []. I don't actually get any values. Any reason why this happens? – Keshav Sota Oct 08 '15 at 01:04

0 Answers0