0

I am trying to open a json file from an API with includes characters of the polish alphabet. I have tried to encode the url into utf-8 but still all kind of problems pop up. I include the code I wrote and the error that appears.

import urllib.request as request
import json
url='https://api.um.warszawa.pl/api/action/dbtimetable_get?id=myapiID&busstopId=wartość&busstopNr=wartość&line=wartość&apikey=wartość'
url=url.encode('utf-8')
with request.urlopen(url) as response:
    source = response.read()
    data = json.loads(source)

Then the error: 'bytes' object has no attribute 'timeout' appears.

snakecharmerb
  • 47,570
  • 11
  • 100
  • 153
DAVID LOBO
  • 11
  • 1
  • post the full traceback. are you on python 2 or 3? – Paul H Sep 17 '20 at 16:15
  • Could you also try posting the result of printing the object that it says has no attribute 'timeout' ? – swarles-barkley Sep 17 '20 at 16:47
  • That's interesting... here somebody seemed to have solved that https://stackoverflow.com/questions/1916684/cant-open-unicode-url-with-python. But I tried your version and I also have the `timeout` error: `AttributeError: 'bytes' object has no attribute 'timeout'`. I tried to tweak it with a custom class: `class StringWithTimeout(str): def __new__(cls, string, timeout): obj = str.__new__(cls, string) setattr(obj, 'timeout', timeout) return obj`. But then I get `URLError: ` – deponovo Sep 17 '20 at 17:17
  • Which python version are you using? – deponovo Sep 17 '20 at 17:21
  • Yet another potential solution https://stackoverflow.com/questions/36395705/unicode-string-in-urllib-request – deponovo Sep 17 '20 at 17:32
  • I am using python 3.7.3 @deponovo – DAVID LOBO Sep 18 '20 at 19:14

1 Answers1

0

There are two problems here, probably both stemming from the requirement to access a url with query components that include non-ASCII characters.

  • Firstly, passing a bytes instance to urlopen will lead to unexpected behaviour, as described here
  • Secondly, non-ASCII characters in a URL's query parameters are not permitted, so the query parameters must be urlencoded.

So given the invalid url, you need to do something like this:

import json
from urllib import parse
from urllib import request

parts - parse.urlsplit(url)
query_dict = parse.parse_qs(parts.query)
encoded_query = parse.urlencode(query_dict)
fixed_url = parse.urlunsplit((parts.scheme, parts.netloc, parts.path, encoded_query, parts.fragment))
response = request.urlopen(fixed_url)

print(json.load(response))
snakecharmerb
  • 47,570
  • 11
  • 100
  • 153
  • Thank you for the suggestion @snakecharmerb. I tried and it seems that there is an issue with non-string sequence or mapping object. I include the exact error for detail. `# non-empty strings will fail this 890 if len(query) and not isinstance(query[0], tuple): --> 891 raise TypeError 892 # Zero-length sequences of all types will get here and succeed, 893 # but that's a minor nit. Since the original implementation TypeError: not a valid non-string sequence or mapping object ` – DAVID LOBO Sep 18 '20 at 19:18
  • Sorry, I missed a step in the answer. It should work now. – snakecharmerb Sep 19 '20 at 06:31