I am receiving an error when trying to read decode a url using "utf-8"

Question

I am very new to python and dont understand why this does not work, as it shows to be working for other people.

import re
import urllib.request
#https://finance.yahoo.com/quote/
url = "https://finance.yahoo.com/quote/"
stock = input("Enter your stock: ")
url = url + stock
print(url)
https://finance.yahoo.com/quote/AAPL
data = urllib.request.urlopen(url).read()
data = urllib.request.urlopen(url).read()
data1 = data.decode("utf-8")

The error message returns:

Traceback (most recent call last): File "<pyshell#9>", line 1, in data1 = data.decode("utf-8") UnicodeDecodeError: 'utf-8' codec can't decode byte 0x8b in position 1: invalid start byte

Please let me know how I can solve this.

Is Yahoo responding with a gzipped payload? See also [this question](https://stackoverflow.com/questions/44659851/unicodedecodeerror-utf-8-codec-cant-decode-byte-0x8b-in-position-1-invalid). urllib is a very low level library, it won't deal with encoding formats for you. Suggest using [requests library](https://requests.readthedocs.io/en/latest/) for this. — Nick ODell, Oct 13 '22 at 21:58

score 2 · Answer 1 · answered Oct 13 '22 at 22:08

The content is gzipped, here's what you're probably after:

import urllib.request
import gzip

url = "https://finance.yahoo.com/quote/"
stock = input("Enter your stock: ")
url = url + stock
print(url)

request = urllib.request.Request(url)
request.add_header('Accept-encoding', 'gzip')
data = gzip.GzipFile(fileobj=urllib.request.urlopen(request)).read()

content = data.decode("utf-8")
print(content)

Or you can use a third party library, like requests, as use @NickODell suggests in the comments - but it's not required, as this example shows.

Less explicitly, close to what you had:

import urllib.request
import gzip
import io

url = "https://finance.yahoo.com/quote/"
stock = input("Enter your stock: ")
url = url + stock
print(url)

data = gzip.GzipFile(fileobj=urllib.request.urlopen(url)).read()  # this also works

content = data.decode("utf-8")
print(content)

It's unclear why you requested the content twice - if that's part of your problem, please explain it in the question:

data = urllib.request.urlopen(url).read()
data = urllib.request.urlopen(url).read()

score -2 · Answer 2 · answered Oct 13 '22 at 22:18

-2

you can ignore the error with

data.decode('utf-8', 'ignore')

That will get rid of your error, but you're left with a long string that looks like mostly gibberish. What is your end goal here? You might be better off using the requests library.

Requests

answered Oct 13 '22 at 22:18

CSEngiNerd

199
6

1

Ignoring the error solves nothing. – Sören Nov 11 '22 at 23:29

I am receiving an error when trying to read decode a url using "utf-8"

2 Answers2