-2

I'm new to python. I'm trying to get a web page using urllib. This is working on normal class. I want to cleanup the code so I usually put them to static method and call them in other class.

When the code executed, the program immediately stops and no error codes in the console. If I don't use the decode function, there is no error but the data is in bytes.

Can someone enlighten me why this happen?

import sys
import urllib.request

class AppTool():

    @staticmethod
    def getURL(URL):
        result = ""

        try:
            request = urllib.request.Request(URL)
            response = urllib.request.urlopen(request)
            result = response.read().decode('utf-8')
            print("result : {}".format(result))

        except:
            print("Error: {}".format(sys.exc_info()))

        return result
user77177928
  • 465
  • 6
  • 15

1 Answers1

-2

Try this one instead of decode('UTF-8'):

response.read().decode('utf-8', errors='ignore')

But though I would recommend you using requests library of python. It has better error handling and gives you utf-8 content easier.

For errors= you can choose either:

'strict': raise an exception in case of an encoding error
'replace': replace malformed data with a suitable replacement marker, such as '?' or '\ufffd'
'ignore': ignore malformed data and continue without further notice
'xmlcharrefreplace': replace with the appropriate XML character reference (for encoding only)
'backslashreplace': replace with backslashed escape sequences (for encoding only)

Here is a much better answer: https://stackoverflow.com/a/517974/1463812

Community
  • 1
  • 1
JSBach
  • 447
  • 1
  • 6
  • 13
  • It means the page is not in "UTF-8". Which URL can you share? Because it seems it may be one of ISO-xxxx or Windows-xxx encoded pages. There you have to first decode according to original dencoding then you can encode according your utf-8 or whatever preference. – JSBach May 13 '16 at 09:31
  • URL is https://www.python.org/. It's working if I don't put it in static method and if I remove the decode I can see the bytes without error – user77177928 May 13 '16 at 09:33
  • 1
    You should not do this regardless. You don't ignore errors, you fix them – Padraic Cunningham May 13 '16 at 09:34
  • It depends on your usage. If you would like to find the result, you may first ignore so that you have an idea what it looks like when they are missing. Matter of preference for the path to the solution. – JSBach May 13 '16 at 09:35
  • By the way, wierd enough, this works for python 3.5.1 on python.org (even when staticmethod). Just to let you know. Start byte is emerging error. Which is not even a "visible byte". – JSBach May 13 '16 at 09:39