52

Using the aws-sdk module and Express 4.13, it's possible to proxy a file from S3 a number of ways.

This callback version will return the file body as a buffer, plus other relevant headers like Content-Length:

function(req,res){

  var s3 = new AWS.S3();

  s3.getObject({Bucket: myBucket, Key: myFile},function(err,data){

    if (err) {
      return res.status(500).send("Error!");
    }

    // Headers
    res.set("Content-Length",data.ContentLength)
       .set("Content-Type",data.ContentType);

    res.send(data.Body); // data.Body is a buffer

  });

}

The problem with this version is that you have to get the entire file before sending it, which is not great, especially if it's something large like a video.

This version will directly stream the file:

function(req,res){

  var s3 = new AWS.S3();

  s3.getObject({Bucket: myBucket, Key: myFile})
    .createReadStream()
    .pipe(res);

}

But unlike the first one, it won't do anything about the headers, which a browser might need to properly handle the file.

Is there a way to get the best of both worlds, passing through the correct headers from S3 but sending the file as a stream? It could be done by first making a HEAD request to S3 to get the metadata, but can it be done with one API call?

NChase
  • 1,638
  • 4
  • 22
  • 25

4 Answers4

39

One approach is listening the httpHeaders event and creating a stream within it.

s3.getObject(params)
    .on('httpHeaders', function (statusCode, headers) {
        res.set('Content-Length', headers['content-length']);
        res.set('Content-Type', headers['content-type']);
        this.response.httpResponse.createUnbufferedStream()
            .pipe(res);
    })
    .send();
André Werlang
  • 5,839
  • 1
  • 35
  • 49
  • my problem is js files are always coming back as text/html content-type? – allencoded Aug 17 '20 at 00:53
  • @andré-werlang what is `this.response.` ? is it a scoped object inside callback function? – Oleg Abrazhaev Jul 07 '21 at 13:59
  • 1
    @OlegAbrazhaev refer to AWS SDK docs for further information: https://docs.aws.amazon.com/AWSJavaScriptSDK/latest/AWS/S3.html#getObject-property – André Werlang Jul 07 '21 at 18:30
  • The problem with this solution is that it copies the data to the server first. Not practical for files in the gigabyte range. The solution from cdhowie is better. Anyway, thanks for sharing the code! Only then could something better come about. :) – user1791139 Jun 08 '22 at 14:49
  • @user1791139 this solution uses `createUnbufferedStream()` which avoids buffering. The other solutions use `createReadStream()` which pipes `createUnbufferedStream()` into 2 `PassThrough` stream. This one is the most efficient. In any case data is copied to the server kernel and user space memory. – André Werlang Jun 10 '22 at 03:27
  • @user1791139 I'd love if you could share your benchmark or analysis though. – André Werlang Jun 10 '22 at 03:29
26

For my project, I simply do a headObject in order to retrieve the object metadata only (it's really fast and avoid to download the object). Then I add in the response all the headers I need to propagate for the piping:

    var s3 = new AWS.S3();

    var params = {
        Bucket: bucket,
        Key: key
    };
    s3.headObject(params, function (err, data) {
        if (err) {
            // an error occurred
            console.error(err);
            return next();
        }
        var stream = s3.getObject(params).createReadStream();

        // forward errors
        stream.on('error', function error(err) {
            //continue to the next middlewares
            return next();
        });

        //Add the content type to the response (it's not propagated from the S3 SDK)
        res.set('Content-Type', mime.lookup(key));
        res.set('Content-Length', data.ContentLength);
        res.set('Last-Modified', data.LastModified);
        res.set('ETag', data.ETag);

        stream.on('end', () => {
            console.log('Served by Amazon S3: ' + key);
        });
        //Pipe the s3 object to the response
        stream.pipe(res);
    });
Mathieu Seiler
  • 726
  • 8
  • 13
  • 1
    So you prefer to make two HTTP requests instead of one? – Diligent Key Presser Dec 20 '16 at 01:02
  • 1
    Hi, I am new to streaming and was going through all answers, @André Werlang takes 5 sec and this 10 sec for a specific image in the same env on average, no way I am saying this answer is bad or wrong, I am just curious why? is because of 2 API calls? – mukuljainx May 16 '21 at 07:30
  • Actually a 2 calls implementation can be worst than 1 call when your client is located far from the S3 bucket cluster (due to the network latency) or if you are dealing with small files. The "one approach" solution proposed above is anyway my favorite (my proposition is not fitting for all use cases) – Mathieu Seiler Sep 22 '21 at 10:27
  • The code works but I get the error message: Error [ERR_HTTP_HEADERS_SENT]: Cannot set headers after they are sent to the client . But after stream.pipe(res) I don't put headers anymore. Does anyone know if stream.pipe() itself still sets a header? – user1791139 May 27 '22 at 11:39
23

Building on André Werlang's answer, we have done the following to augment AWS Request objects with a forwardToExpress method:

const _ = require('lodash');
const AWS = require('aws-sdk');

AWS.Request.prototype.forwardToExpress = function forwardToExpress(res, next) {
    this
    .on('httpHeaders', function (code, headers) {
        if (code < 300) {
            res.set(_.pick(headers, 'content-type', 'content-length', 'last-modified'));
        }                            
    })
    .createReadStream()
    .on('error', next)
    .pipe(res);
};    

Then, in our route handlers, we can do something like this:

s3.getObject({Bucket: myBucket, Key: myFile}).forwardToExpress(res, next);
cdhowie
  • 158,093
  • 24
  • 286
  • 300
  • 2
    This is essentially what we do. I would suggest that the set of headers you want to propagate (assuming you want to support range requests and eTag based client caching) are probably: ['accept-ranges', 'content-length', 'content-range', 'content-type', 'etag', 'last-modified']. – BobDickinson Aug 29 '17 at 18:21
  • 4
    Don’t forget to add `res.status(code)` above `if (code < 300)` to pass along whatever status code was returned from S3, for cases like status 206 (partial response, such as for a video). – Geoffrey Booth Jun 26 '18 at 17:57
  • This looks great although it forces my browser to download the html file (from s3) instead of display it. Do you know how to fix this? – Mulhoon Jul 13 '18 at 19:28
  • 2
    @Mulhoon This code uses the same content-type value that was used to store the object in S3. HTML files would need to be stored in S3 with the content-type header `text/html`. If you uploaded to S3 using the AWS console in your browser, it probably used the content-type `application/octet-stream`, which is the universal "this is a sequence of bytes, but that's all I know." Browsers react to this by downloading the file instead of displaying it. So either (1) store the right content-type in S3 (preferable) or (2) use `res.set('content-type', 'text/html')` to force it in Express. – cdhowie Jul 15 '18 at 18:24
  • 1
    Thanks @cdhowie - It was the headers. I ended up using [mime-types](https://www.npmjs.com/package/mime-types) to determine the right content-type – Mulhoon Jul 17 '18 at 09:31
  • @cdhowie How can I give the name to the downloaded file ? I mean the key in bucket. – Bharat Bhushan Feb 23 '19 at 06:38
  • Does this support part byte requests. Safari doesnt work if not – Sharan Mohandas Nov 22 '19 at 10:22
8

here an 20222 solution using AWS JS SDK and Client. This solution is using s3client to stream the object data to the response stream 'res'. A simple call to GetObjectCommand returns only the raw data of the object but not the content-length neither tags, that is will prior to stream the object data, 1 call is done to get the object data like "ETag", "Content-Type" and 1 optional call is done to get the object tags and forward it to the UI.

It's up to you to customize it for your project.

const { S3, CreateBucketCommand, PutObjectCommand, GetObjectCommand, DeleteObjectCommand, DeleteBucketCommand, } = require("@aws-sdk/client-s3");

/**
 * 
 * @param {*} res 
 * @param {string} bucketName Bucket Name
 * @param {string} key Object key
 * @param {number} cacheExpiration Cache expiration in ms
 * @param {boolean} streamTags Forward object tags in http-headers
 */
async function streamGetObject(res, bucketName, key, cacheExpiration, streamTags) {
    try {
        const params = {
            Bucket: bucketName,
            Key: key,
        };
        // Head the object to get classic the bare minimum http-headers information
        const headResponse = await s3Client.send(new HeadObjectCommand(params));
        res.set({
            "Content-Length": headResponse.ContentLength,
            "Content-Type": headResponse.ContentType,
            "ETag": headResponse.ETag,
        });
        // Get the object taggings (optional)
        if (streamTags === true) {
            const taggingResponse = await s3Client.send(new GetObjectTaggingCommand(params));
            taggingResponse.TagSet.forEach((tag) => {
                res.set("X-TAG-" + tag.Key, tag.Value);
            });
        }
        // Prepare cache headers
        if (typeof cacheExpiration === "number") {
            res.setHeader("Cache-Control", "public, max-age=" + cacheExpiration / 1000);
            res.setHeader("Expires", new Date(Date.now() + cacheExpiration).toUTCString());
        } else {
            res.setHeader("Pragma", "no-cache");
            res.setHeader("Cache-Control", "no-cache");
            res.setHeader("Expires", 0);
        }

        // Now get the object data and stream it
        const response = await s3Client.send(new GetObjectCommand(params));
        const stream = response.Body;
        stream.on("data", (chunk) => res.write(chunk));
        stream.once("end", () => {
            res.end();
        });
        stream.once("error", () => {
            res.end();
        });
    } catch (err) {
        console.log("Error", err);
        throw err;
    }
}
Mathieu Seiler
  • 726
  • 8
  • 13
  • 1
    For anyone wondering. s3Client came from const { S3, CreateBucketCommand, PutObjectCommand, GetObjectCommand, DeleteObjectCommand, DeleteBucketCommand, } = require("@aws-sdk/client-s3"); – LeoPucciBr Jan 14 '22 at 19:38