1

Note the link listed in the comment is for Python 2.7, but this questions pertains to Python 3.7.

I'm using Python 3.7 and Django. I want to read from a URL that has special characters in its string, but get errors when I try the traditional way ...

>>> url = "https://www.supergaming.com/f/gaming/article/pvmqe/was_browsing_the_steam_app_reviews_and_ಠ_ಠ/"
...
>>> html = urllib2.urlopen(req, 5000).read()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/urllib/request.py", line 222, in urlopen
    return opener.open(url, data, timeout)
  File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/urllib/request.py", line 525, in open
    response = self._open(req, data)
  File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/urllib/request.py", line 543, in _open
    '_open', req)
  File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/urllib/request.py", line 503, in _call_chain
    result = func(*args)
  File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/urllib/request.py", line 1360, in https_open
    context=self._context, check_hostname=self._check_hostname)
  File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/urllib/request.py", line 1317, in do_open
    encode_chunked=req.has_header('Transfer-encoding'))
  File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/http/client.py", line 1229, in request
    self._send_request(method, url, body, headers, encode_chunked)
  File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/http/client.py", line 1240, in _send_request
    self.putrequest(method, url, **skips)
  File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/http/client.py", line 1107, in putrequest
    self._output(request.encode('ascii'))
UnicodeEncodeError: 'ascii' codec can't encode character '\u0ca0' in position 69: ordinal not in range(128)

So I tried the solution recommended here -- How to convert a url string to safe characters with python? , but I'm still unable to read the URL

>>> urllib.parse.quote_plus(url)
'https%3A%2F%2Fwww.supergaming.com%2Ff%2Fgaming%2Farticle%2Fpvmqe%2Fwas_browsing_the_steam_app_reviews_and_%E0%B2%A0_%E0%B2%A0%2F'
>>> req = urllib2.Request(urllib.parse.quote_plus(url))
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/urllib/request.py", line 328, in __init__
    self.full_url = url
  File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/urllib/request.py", line 354, in full_url
    self._parse()
  File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/urllib/request.py", line 383, in _parse
    raise ValueError("unknown url type: %r" % self.full_url)
ValueError: unknown url type: 'https%3A%2F%2Fwww.supergaming.com%2Fr%2Fgaming%2Farticle%2Fpvmqe%2Fwas_browsing_the_steam_app_reviews_and_%E0%B2%A0_%E0%B2%A0%2F'

What's the proper way to read from a URL if it contains special characters?

Dave
  • 15,639
  • 133
  • 442
  • 830
  • 1
    Possible duplicate of [UnicodeError: URL contains non-ASCII characters (Python 2.7)](https://stackoverflow.com/questions/33708059/unicodeerror-url-contains-non-ascii-characters-python-2-7) – Zach Gates Sep 03 '19 at 19:36
  • That link you listed is for Python 2.7. My question pertains to Python 3.7. – Dave Sep 03 '19 at 20:15
  • Did you try it? I believe the same answer applies. – Zach Gates Sep 03 '19 at 20:56
  • I get the error, "AttributeError: module 'urllib' has no attribute 'quote'". – Dave Sep 03 '19 at 21:17
  • In Python 3 that would be `urllib.request.quote`. – Zach Gates Sep 03 '19 at 21:18
  • Actually in python 3 the function is [urllib.parse.quote](https://docs.python.org/3/library/urllib.parse.html#urllib.parse.quote) - I've edited the accepted answer on the duplicate candidate to mention this. – snakecharmerb Sep 04 '19 at 05:22
  • You're right, that's the correct usage. But for whatever reason, `urllib.request.quote` is equivalent. – Zach Gates Sep 04 '19 at 22:09

0 Answers0