3

I am trying to get some information from google finance but I am getting this error

AttributeError: 'HTTPResponse' object has no attribute 'split'

Here is my python code:

import urllib.request
import urllib
from bs4 import BeautifulSoup

symbolsfile = open("Stocklist.txt")

symbolslist = symbolsfile.read()

thesymbolslist = symbolslist.split("\n")

i=0


while i<len (thesymbolslist):
    theurl = "http://www.google.com/finance/getprices?q=" + thesymbolslist[i] + "&i=10&p=25m&f=c"
    thepage = urllib.request.urlopen (theurl)
    print(thesymbolslist[i] + " price is " + thepage.split()[len(thepage.split())-1])
    i= i+1
Danny Herbert
  • 2,002
  • 1
  • 18
  • 26
Zepol
  • 195
  • 2
  • 3
  • 14

2 Answers2

12

The Cause of the Problem

This is because urllib.request.urlopen (theurl) returns an object representing the connection, not a string.


The Solution

To read data from this connection and actually get a string, you need to do

thepage = urllib.request.urlopen(theurl).read()

and then the rest of your code should follow naturally.

Addendum to the Solution

Occasionally, the string itself contains an unrecognised character encoding glyph, in which case Python converts it into a bytestring.

The right approach to dealing with that is to find the correct character encoding and decode the bytestring into a regular string using it, as seen in this question:

thepage = urllib.request.urlopen(theurl)
# read the correct character encoding from `Content-Type` request header
charset_encoding = thepage.info().get_content_charset()
# apply encoding
thepage = thepage.read().decode(charset_encoding)

It is sometimes safe to make the assumption that the character encoding is utf-8, in which case

thepage = urllib.request.urlopen(theurl).read().decode('utf-8')

does work more often than not. It's a statistically good guess if nothing else.

Community
  • 1
  • 1
Akshat Mahajan
  • 9,543
  • 4
  • 35
  • 44
  • once i did that it gave me this error: TypeError: Can't convert 'bytes' object to str implicitly – Zepol May 22 '16 at 02:30
  • It's because the encoding of the string you're receiving is not something Python understands. Give me a minute to provide a fix. – Akshat Mahajan May 22 '16 at 02:37
  • Your solution is more robust since it does not depend on the source encoding, so OP: better mark this one the right answer :) – le_m May 22 '16 at 02:50
4

Checking the documentation might save you time in the future. It says that the urlopen() method returns an HTTPResponse object which has a read() method. In Python 3, you need to decode the output from the source encoding, in this case UTF-8. So just write

thepage = urllib.request.urlopen(theurl).read().decode('utf-8')
le_m
  • 19,302
  • 9
  • 64
  • 74
  • once i did that it gave me this error: TypeError: Can't convert 'bytes' object to str implicitly – Zepol May 22 '16 at 02:32
  • Python 3? Then see http://stackoverflow.com/questions/16699362/python3-error-typeerror-cant-convert-bytes-object-to-str-implicitly/16699591#16699591 Try `thepage = urllib.request.urlopen(theurl).read().decode('utf-8')` – le_m May 22 '16 at 02:37
  • 1
    @le_m That assumes the default encoding is `utf-8` - which is often true, but is not necessarily the encoding sent over. The correct way to do it is to check the encoding in the headers and apply that. – Akshat Mahajan May 22 '16 at 02:43
  • @AkshatMahajan You are right, of course, but since OP is just querying google.com we can safely assume UTF-8. – le_m May 22 '16 at 02:46
  • @le_m [You would be surprised what character encodings Google uses in lieu of UTF-8](http://stackoverflow.com/questions/36877016/typeerror-str-does-not-support-the-buffer-interface-in-html2text/36879764#36879764)... – Akshat Mahajan May 22 '16 at 02:48