1

I have run into an issue with using API Gateway as a proxy to S3 (for custom authentication), in that it does not handle binary data well (which is a known issue).

I'm usually uploading either .gz or .Z (Unix compress utility) files. As far as I understand it, the data is not maintained due to encoding issues. I can't seem to figure out a way to decode the data back to binary.

Original leading bytes: \x1f\x8b\x08\x08\xb99\xbeW\x00\x03

After passing through API GW: ��9�W�

... Followed by filename and the rest of the data.

One way of 'getting around this' is to specify Content-Encoding in the header of the PUT request to API GW as 'gzip'. This seems to force API GW to decompress the file before forwarding it to S3.

The same does not work for .Z files compressed with the Unix compress utility. Where you should specify the Content-Encoding as 'compress'.

Does anyone have any insight about what is happening to the data, to help shed some light on my issue? Also, does anyone know any possible work-around's to maintain the encoding of my data while passing through API GW (or to decode it once it's in S3)?

Obviously I could just access the S3 API directly (or have API GW return a pre-signed URL for accessing the S3 API), but there are a few reasons why I don't want to do that.

I should mention that I don't understand very much at all about encoding - sorry if there are some obvious answers to some of my questions.

unclemeat
  • 5,029
  • 5
  • 28
  • 52

1 Answers1

2

It's not exactly an "encoding issue" -- it's the fact that API Gateway just doesn't support binary data ("yet")... so it's going to potentially corrupt binary data, depending on the specifics of the data in question.

Uploading as Content-Encoding: gzip probably triggers decoding in a front-end component that is capable of dealing with binary data (gzip, after all, is a standard encoding and is binary) before passing the request body to the core infrastructure... but you will almost certainly find that this is a workaround that does not consistently deliver correct results, depending on the specific payload. The fact that it works at all seems more like a bug than a feature.

For now, the only consistently viable option is base64-encoding your payload, which increases its size on-the-wire by 33% (base64 encoding produces 4 bytes of output for every 3 bytes of input) so it's not much of a solution. Base64 + gzip with the appropriate Content-Encoding: gzip should also work, which seems quite a silly suggestion (converting a compressed file into base64 then gzipping the result to try to reduce its size on the wire) but should be consistent with what API Gateway can currently deliver.

Community
  • 1
  • 1
Michael - sqlbot
  • 169,571
  • 25
  • 353
  • 427
  • Thanks for your response. I'm currently working on transforming the data using `$util.base64Encode($input.body)` in a Body Mapping Template for anything with the Content-Type `application/x-gzip` (will do the same for `application/compress`). But I can't seem to get it to simply transform and pass that through to a file. I'm using the default `Method Request passthrough` template, and changing the 'body-json' using `$util.base64Encode`. It just ends up storing the JSON in the file. Though the 'body-json' field does contain the transformed data. – unclemeat Aug 29 '16 at 03:42
  • Nevermind - I see that you just need to do `$util.base64Encode($input.body)` with no other information / json. Thanks again. – unclemeat Aug 29 '16 at 03:50
  • 1
    I was thinking encoding on the client, submitting as a urlencoded form, and decoding from an input parameter in the Lambda function, leaving API Gateway blissfully unaware of what you were doing, not participating in the transformation. Frankly, it seems like potentially more trouble than it's worth until the platform is 8-bit clean and octet agnostic. – Michael - sqlbot Aug 29 '16 at 04:04
  • 1
    That is pretty unfortunate. – unclemeat Aug 29 '16 at 04:48