AWS S3 - Fetch PDF as octet-stream and upload to S3 bucket

Question

I'm fetching a PDF from a 3rd-party API. The response content-type is application/octet-stream. Thereafter, I upload it to S3 but if I go to S3 and download the newly written file, the content is not visible, the pages are blank, viewed in Chromium and Adobe Acrobat. The file is also not zero bytes and has the correct number of pages.

Using the binary encoding gives me a file size closest to the actual file size. But it's still not exact, it's slightly smaller.

The API request (using the request-promise module):

import { get } from 'request-promise';

const payload = await get('someUrl').catch(handleError);

const buffer = Buffer.from(payload, 'binary');
const result = await new S3().upload({
  Body: buffer,
  Bucket: 'somebucket',
  ContentType: 'application/pdf',
  ContentEncoding: 'binary',
  Key: 'somefile.pdf'
}).promise();

Additionally, downloading the file from Postman also results in a file with blank pages. Does anybody know where I am going wrong here?

Note that `binary` isn't a [valid value](https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Content-Encoding) for `ContentEncoding`, so specifying `binary` should be equivalent to specifying nothing at all, there. Valid values are `gzip` `br` `deflate` `compress` and `identity` (which is equivalent to specifying no value at all). This should be set equivalent to whatever the original service is sending to you as `Content-Encoding`, unless you can prove that they are setting it incorrectly, as they appear to be doing with `Content-Type` -- but that seems unlikely. — Michael - sqlbot, Feb 26 '19 at 17:57
`Buffer.from(payload, 'binary');` ... so, `payload` is originally a string? That seems potentially problematic, but I guess it depends on how you are doing the download. We probably need to see that code. Based on currently available info, the download seems more likely than the upload to be where the original problem is arising. — Michael - sqlbot, Feb 26 '19 at 18:06
@Michael-sqlbot, thanks for the insight. I've updated the question. I do realise now that I am not reading the whole response payload... my download would be grabbing only the first chunk, I presume. — ethane, Feb 26 '19 at 19:09

ethane · Accepted Answer · 2020-04-16T07:39:03.607

As @Micheal - sqlbot mentioned in the comments, the download was the issue. I wasn't getting the entire byte stream from the API.

Changing const payload = await get('someUrl').catch(handleError);

to

import * as request from 'request'; // notice I've imported the base request lib 

let bufferArray = [];

request.get('someUrl')
.on('response', (res) => {

  res.on('data', (chunk) => {
    bufferArray = bufferArray.concat(Buffer.from(chunk)); //save response in a temp array for now
  });

  .on('end', () => {
    const dataBuffer = Buffer.concat(bufferArray); //this now contains all my data
    //send to s3
  });
});

Note: it is not recommended to stream responses with the request-promise library - outlined in the documentation. I used the base request library instead.

https://github.com/request/request-promise#api-in-detail

AWS S3 - Fetch PDF as octet-stream and upload to S3 bucket

1 Answers1