-1

I'm trying to download json data via an API. The code is as follows:

import urllib.request, ssl, json

context = ssl._create_unverified_context()
rsbURL = "https://rsbuddy.com/exchange/summary.json"
with urllib.request.urlopen(rsbURL, context=context) as url:
    data = json.loads(url.read().decode('UTF-8'))

This code works perfectly fine on my Mac, and I confirmed that data is what is supposed to be the JSON string. However, when I run the exact same code on windows, I get this error:

UnicodeDecodeError: 'utf-8' codec can't decode byte 0x8b in position 1: invalid start byte

What is going on and how do I fix it?

wovano
  • 4,543
  • 5
  • 22
  • 49
quyksilver
  • 13
  • 6
  • Can you try the code without the `json.loads` and `.decode()` commands? It's possible that your SSL libraries are different between environments. That could cause the url to read an error on one environment (probably the windows one). – SNygard May 21 '19 at 21:49
  • I just tried that, on the Mac I get `b'{"2":{"id":2,"name":"Cannonball","members":true,"sp":5,"buy_average":163,"buy_quantity":276835,"sell_average":161,"sell_quantity":642206,"overall_average":161,"overall_quantity":919041}...`, which is readable as JSON. But on Windows, I get `b'\x1f\x8b\x08\x00\x00\x00\x00\x00\x00\x03\xcc\xbd]s$G\x8e%\xfaWhz\xd2\xda\xf4\xae1"\xf2s\xde\xa4R\xb7\xbagZ\ `, which is not readable as JSON. And when I try, the error is the same: `UnicodeDecodeError: 'utf-8' codec can't decode byte 0x8b in position 1: invalid start byte` – quyksilver May 21 '19 at 21:56
  • 1
    The code works for me under Windows. You could try to divide the problem into two sub problems: downloading of the file and decoding of the file. You could save the file and then use `open('summary.json', 'rb').read().decode('utf-8')` to see if that works. – wovano May 21 '19 at 22:11

1 Answers1

1

Looks like the server is sending a compressed response for some reason (it shouldn't be doing that unless you explicitly set the accept-encoding header). You can adapt your code to work with compressed responses like this:

import gzip
import urllib.request, ssl, json

context = ssl._create_unverified_context()
rsbURL = "https://rsbuddy.com/exchange/summary.json"
with urllib.request.urlopen(rsbURL, context=context) as url:
    if url.info().get('Content-Encoding') == 'gzip':
        body = gzip.decompress(url.read())
    else:
        body = url.read()
data = json.loads(body)
Tadeusz Sznuk
  • 994
  • 6
  • 9
  • 1
    Possibly related: https://stackoverflow.com/questions/3947120/does-python-urllib2-automatically-uncompress-gzip-data-fetched-from-webpage . Using requests is maybe easier... – Nick T May 21 '19 at 22:20
  • Interesting! Do you actually get compressed data? Because it works for me without compressing. Does this depend on the Python version? Or the server? Or possibly a firewall / virus scanner / VPN / proxy server that modifies the connection? – wovano May 21 '19 at 22:44