64k limit on json data for urlopen

Question

Banging my head on a problem. I will caveat in advance that is not reproducible since I cannot share my end point. Also I work as a data scientist, so my knowledge of web technologies is limited.

from urllib.request import Request, urlopen

url = "https://www.some_endpoint.com/"
req = Request(
    url, headers={"API-TOKEN": "some_token"})
json_string = "{"object": "XYZ".....}"

response = urlopen(req, json_string.encode("utf-8"))

I am getting unusual behavior on the urlopen. When my JSON is below 65536 bytes, as shown by evaluating len(json_string.encode('utf-8')), this urlopen call works fine. When it is over that limit, I get an HTTP 500 error.

Is this purely a server side error limitation on sizing? What is unusual is that when the large data is passed through a GUI to the endpoint, it works fine. Or is there something I can do to chunk my data to sub 64k bytes on the urlopen? Are there industry standards for handling this?

It depends on the server. I would suggest switching to the Requests library (http://docs.python-requests.org/en/master/) anyway, to make things easier for you. — AKX, Oct 24 '18 at 14:50
Have a look at this - https://stackoverflow.com/a/1517728/4288043 — cardamom, Oct 24 '18 at 15:10
Someone said "64k should be enough for anyone" :) Just google this sentence. — dani herrera, Oct 24 '18 at 15:42

score 1 · Accepted Answer · answered Oct 24 '18 at 15:40

An HTTP 500 error indicates an "internal server error". In theory, this means that there is not a problem with your code, there is a problem with the server.

In practice, an HTTP 500 error can mean almost anything, because many servers will use HTTP 500 as the default error code when a more specific error code is not provided by the programmer. Unfortunately, this means you are reduced to making guesses at how somebody else's code works.

Here are some possible approaches:

It's possible that the server has a maximum request size of 64 KiB. You can reduce your request size by using more compact JSON (remove spaces between delimiters) or by using Content-Encoding: gzip.

import gzip
import json

# Remove whitespace from JSON
json_string = json.dumps(
    json.loads(json_string),
    separators=(',', ':'))
# Encode as Gzip
json_data = gzip.compress(
    json_string.encode('UTF-8'))

req = Request(
    url, headers={"API-TOKEN": "some_token",
                  "Content-Encoding": "gzip"})
response = urlopen(req, json_data)

It's possible that there is some way of splitting or chunking the request into multiple, smaller requests. This would require knowledge of the exact API you are using.
It's possible that there's some bug in the server or a proxy somewhere in the chain that prevents you from sending the request as written. You could try using Transfer-Encoding: chunked, if Content-Length does not work for >64 KiB. It's possible the server expects to use 100 Continue, but urllib does not support this.

If you MITM your GUI client with a tools like Charles, you can see the exact format of the request and you can make your own request use the same format.

thanks for the answer. tried gzip, transfer-encoding; no luck. i will just split on my end locally. — AZhao, Oct 24 '18 at 19:59

64k limit on json data for urlopen

1 Answers1