20

I use the request-module of python 2.7 to post a bigger chunk of data to a service I can't change. Since the data is mostly text, it is large but would compress quite well. The server would accept gzip- or deflate-encoding, however I do not know how to instruct requests to do a POST and encode the data correctly automatically.

Is there a minimal example available, that shows how this is possible?

AME
  • 2,499
  • 5
  • 29
  • 45
  • Doesn't looks like it's possible, have you looked at [this](http://stackoverflow.com/a/2397242/2137601) and [this](http://stackoverflow.com/a/424948/213760) ? – Paul Mougel Dec 06 '13 at 14:16
  • No, but that's not relevant since I am not interested if it is possible per se (which it is), but if it is possible with python's "request"-module. – AME Dec 06 '13 at 14:22
  • 1
    Can you post a minimal example of how you are doing it now, *without* compression? I am specifically curious to know if you are using `data=` in the `requests.post()` call. – Robᵩ Dec 06 '13 at 17:24
  • requests.post(url, params=params_dict, data=json_string, headers=headers_dict) – AME Dec 09 '13 at 08:22

6 Answers6

21
# Works if backend supports gzip

additional_headers['content-encoding'] = 'gzip'
request_body = zlib.compress(json.dumps(post_data))
r = requests.post('http://post.example.url', data=request_body, headers=additional_headers)
KnightOrc
  • 234
  • 2
  • 3
  • 1
    Curious if you know why this doesn't work with the AWS API Gateway. I'm able to test a flask app locally with your recommended change, but when I deploy it to Lambda, the API Gateway delivers HTTP 415 before it reaches the app. Seems your solution is perfectly symmetrical w/ AWS' instructions: https://docs.aws.amazon.com/apigateway/latest/developerguide/api-gateway-make-request-with-compressed-payload.html – Scott Smith Jun 17 '20 at 18:09
  • 1
    @ScottSmith I got this to work on AWS API Gateway but I had to use Python's `gzip` library instead of `zlib`. So I set `payload = gzip.compress(json.dumps(payload).encode('utf-8'))` and also set the headers: `Content-Type=application/json` and `Content-Encoding=gzip` – tobycoleman Oct 24 '20 at 19:54
  • This code fails on python3 , loooks like a utf-8 encoding is missing. – Jean Carlo Machado Jan 06 '22 at 17:01
14

I've tested the solution proposed by Robᵩ with some modifications and it works.

PSEUDOCODE (sorry I've extrapolated it from my code so I had to cut out some parts and haven't tested, anyway you can get your idea)

additional_headers['content-encoding'] = 'gzip'
s = StringIO.StringIO()
g = gzip.GzipFile(fileobj=s, mode='w')
g.write(json_body)
g.close()
gzipped_body = s.getvalue()
request_body = gzipped_body

r = requests.post(endpoint_url, data=request_body, headers=additional_headers)
David Wolever
  • 148,955
  • 89
  • 346
  • 502
Marco Grassi
  • 1,202
  • 9
  • 14
  • It would be better if you could get gzip to write to the socket, rather than writing to a StringIO *then* sending it. – aaa90210 Oct 16 '14 at 22:49
3

For python 3:

from io import BytesIO
import gzip

def zip_payload(payload: str) -> bytes:
    btsio = BytesIO()
    g = gzip.GzipFile(fileobj=btsio, mode='w')
    g.write(bytes(payload, 'utf8'))
    g.close()
    return btsio.getvalue()

headers = {
    'Content-Encoding': 'gzip'
}
zipped_payload = zip_payload(payload)
requests.post(url, zipped_payload, headers=headers)

James D
  • 1,580
  • 1
  • 13
  • 9
  • 2
    It is possible to simplify compression with this one-liner: `zipped_payload = gzip.compress("Hello world".encode('utf-8'))`. – illagrenan Sep 06 '19 at 11:49
2

I needed my posts to be chunked, since I had several very large files being uploaded in parallel. Here is a solution I came up with.

import requests
import zlib

"""Generator that reads a file in chunks and compresses them"""
def chunked_read_and_compress(file_to_send, zlib_obj, chunk_size):
    compression_incomplete = True
    with open(file_to_send,'rb') as f:
        # The zlib might not give us any data back, so we have nothing to yield, just
        # run another loop until we get data to yield.
        while compression_incomplete:
            plain_data = f.read(chunk_size)
            if plain_data:
                compressed_data = zlib_obj.compress(plain_data)
            else:
                compressed_data = zlib_obj.flush()
                compression_incomplete = False
            if compressed_data:
                yield compressed_data

"""Post a file to a url that is content-encoded gzipped compressed and chunked (for large files)"""
def post_file_gzipped(url, file_to_send, chunk_size=5*1024*1024, compress_level=6, headers={}, requests_kwargs={}):
    headers_to_send = {'Content-Encoding': 'gzip'}
    headers_to_send.update(headers)
    zlib_obj = zlib.compressobj(compress_level, zlib.DEFLATED, 31)
    return requests.post(url, data=chunked_read_and_compress(file_to_send, zlib_obj, chunk_size), headers=headers_to_send, **requests_kwargs)

resp = post_file_gzipped('http://httpbin.org/post', 'somefile')
resp.raise_for_status()
Rosco
  • 133
  • 1
  • 8
1

I can't get this to work, but you might be able to insert the gzip data into a prepared request:

#UNPROVEN
r=requests.Request('POST', 'http://httpbin.org/post', data={"hello":"goodbye"})
p=r.prepare()
s=StringIO.StringIO()
g=gzip.GzipFile(fileobj=s,mode='w')
g.write(p.body)
g.close()
p.body=s.getvalue()
p.headers['content-encoding']='gzip'
p.headers['content-length'] = str(len(p.body))  # Not sure about this
r=requests.Session().send(p)
Robᵩ
  • 163,533
  • 20
  • 239
  • 308
1

The accepted answer is probably wrong due to incorrect or missing headers:

additional_headers['content-encoding'] = 'gzip'
request_body = zlib.compress(json.dumps(post_data))

Using the zlib module's compressobj method that provides the wbits argument to specify the header format should work. The default value is MAX_WBITS=15 which means zlib header format. This is correct for Content-Encoding: deflate. For the compress method this argument is not available and the documentation does not mention which header (if any) is used unfortunately.

For Content-Encoding: gzip wbits should be something between 16 + (9 to 15), so 16+zlib.MAX_WBITS would be a good choice.

I checked how urllib3 decodes the response for these two cases and it implements a try-and-error mechanism for deflate (it tries raw and zlib header formats). That could explain why some people had problems with the solution from the accepted answer which others didn't have.


tl;dr

gzip

additional_headers['Content-Encoding'] = 'gzip'
compress = zlib.compressobj(wbits=16+zlib.MAX_WBITS)
body = compress.compress(data) + compress.flush()

deflate

additional_headers['Content-Encoding'] = 'deflate'
compress = zlib.compressobj()
body = compress.compress(data) + compress.flush()
Florian
  • 78
  • 6