21

I have app where user's photos are private. I store the photos(thumbnails also) in AWS s3. There is a page in the site where user can view his photos(i.e thumbnails). Now my problem is how do I serve these files. Some options that I have evaluated are:

  • Serving files from CloudFront(or AWS) using signed url generation. But the problem is every time the user refreshes the page I have to create so many signed urls again and load it. So therefore I wont be able to cache the Images in the browser which would have been a good choice. Is there anyway to do still in javascript? I cant have the validity of those urls for longer due to security issues. And secondly within that time frame if someone got hold of that url he can view the file without running through authentication from the app.
  • Other option is to serve the file from my express app itself after streaming it from S3 servers. This allows me to have http cache headers, therefore enable browser caching. It also makes sure no one can view a file without being authenticated. Ideally I would like to stream the file and a I am hosting using NGINX proxy relay the other side streaming to NGINX. But as i see that can only be possible if the file exist in the same system's files. But here I have to stream it and return when i get the stream is complete. Don't want to store the files locally.

I am not able to evaluate which of the two options would be a better choice?? I want to redirect as much work as possible to S3 or cloudfront but even using singed urls also makes the request first to my servers. I also want caching features.

So what would be ideal way to do? with the answers for the particular questions pertaining to those methods?

ThinkingInBits
  • 10,792
  • 8
  • 57
  • 82
Saransh Mohapatra
  • 9,430
  • 10
  • 39
  • 50

4 Answers4

23

i would just stream it from S3. it's very easy, and signed URLs are much more difficult. just make sure you set the content-type and content-length headers when you upload the images to S3.

var aws = require('knox').createClient({
  key: '',
  secret: '',
  bucket: ''
})

app.get('/image/:id', function (req, res, next) {
  if (!req.user.is.authenticated) {
    var err = new Error()
    err.status = 403
    next(err)
    return
  }

  aws.get('/image/' + req.params.id)
  .on('error', next)
  .on('response', function (resp) {
    if (resp.statusCode !== 200) {
      var err = new Error()
      err.status = 404
      next(err)
      return
    }

    res.setHeader('Content-Length', resp.headers['content-length'])
    res.setHeader('Content-Type', resp.headers['content-type'])

    // cache-control?
    // etag?
    // last-modified?
    // expires?

    if (req.fresh) {
      res.statusCode = 304
      res.end()
      return
    }

    if (req.method === 'HEAD') {
      res.statusCode = 200
      res.end()
      return
    }

    resp.pipe(res)
  })
})
Jonathan Ong
  • 19,927
  • 17
  • 79
  • 118
  • 1
    I will go with your answer for the example. Thanks a lot. But if you don't mind if you can help me with one more thing. Using AWS-SDK is better or knox? – Saransh Mohapatra Jul 16 '13 at 06:14
  • 1
    Never tried aws-sdk. Knox maintainers are more involved with the node community, however. – Jonathan Ong Jul 16 '13 at 06:41
  • 1
    However?? You didn't mention however what? – Saransh Mohapatra Jul 16 '13 at 23:51
  • 2
    "However, knox maintainers are more involved with the node community." – Jonathan Ong Jul 17 '13 at 02:19
  • Ok. According to the above answer by Leonid its slower to serve files directly from express app rather its faster in redirecting it the generated url? Do you feel what he is saying is correct? – Saransh Mohapatra Jul 17 '13 at 07:23
  • Depends if you want control over headers. Really, the difference in speed is minimal. – Jonathan Ong Jul 17 '13 at 07:28
  • 11
    Glad we could all get our grammar lesson here on SO @SaranshMohapatra & JonathanOng. Thanks for your valuable contributions. – theflowersoftime Jul 10 '14 at 15:45
  • 1
    Anyone know of any recent NodeJs conventions/libraries for streaming files to/from S3? Knox is no longer maintained in 2022 (and is actually broken due to a change in the mime package), and I get the sense there must be something else. – Tom Jun 30 '22 at 00:22
9

If you'll redirect user to a signed url using 302 Found browser will cache the resulting image according to its cache-control header and won't ask it the second time.

To prevent browser from caching the signed url itself you should send proper Cache-Control header along with it:

Cache-Control: private, no-cache, no-store, must-revalidate

So the next time it'll send request to the original url and will be redirected to a new signed url.

You can generate signed url with knox using signedUrl method.

But don't forget to set proper headers to every uploaded image. I'd recommend you to use both Cache-Control and Expires headers, because some browser have no support for Cache-Control header and Expires allows you to set only an absolute expiration time.

With the second option (streaming images through your app) you'll have better control over the situation. For example, you'll be able to generate Expires header for each response according to current date and time.

But what about speed? Using signed urls have two advantages which may affect page load speed.

First, you won't overload your server. Generating signed urls if fast because you're just hashing your AWS credentials. And to stream images through your server you'll need to maintain a lot of extra connections during the page load. Anyway, it won't make any actual difference unless your server is hard loaded.

Second, browsers keeps only two parallel connections per hostname during page load. So, browser will keep resolving images urls in parallel while downloading them. It'll also keep images downloading from blocking downloading of any other resources.

Anyway, to be absolutely sure you should run some benchmarks. My answer was based on my knowledge of HTTP specification and on my experience in web developing, but I never tried to serve images that way myself. Serving public images with long cache lifetime directly from S3 increases page speed, I believe the situation won't change if you'll do it through redirects.

And you should keep in mind that streaming images through your server will bring all the benefits of Amazon CloudFront to naught. But as long as you're serving content directly from S3 both options will work fine.

Thus, there are two cases when using signed urls should speedup your page:

  • If you have a lot of images on a single page.
  • If you serving images using CloudFront.

If you have only few images on each page and serving them directly from S3, you'll probably won't see any difference at all.

Important Update

I ran some tests and found that I was wrong about caching. It's true that browsers caches images they was redirected to. But it associates cached image with the url it was redirected to and not with the original one. So, when browser loads the page second time it requests image from the server again instead of fetching it from the cache. Of course, if server responds with the same redirect url it responded the first time, browser will use its cache, but it's not the case for signed urls.

I found that forcing browser to cache signed url as well as the data it receives solves the problem. But I don't like the idea of caching invalid redirect URL. I mean, if browser will miss the image somehow it'll try to request it again using invalid signed url from the cache. So, I think it's not an option.

And it doesn't matter if CloudFront serve images faster or if browsers limits the number of parallel downloads per hostname, the advantage of using browser cache exceeds all the disadvantages of piping images through your server.

And it looks like most social networks solves the problem with private images by hiding its actual urls behind some private proxies. So, they store all their content on public servers, but there is no way to get an url to a private image without authorization. Of course, if you'll open private image in a new tab and send the url to your friend, he'll be able to see the image too. So, if it's not an option for you then it'll be best for you to use Jonathan Ong's solution.

Community
  • 1
  • 1
Leonid Beschastny
  • 50,364
  • 10
  • 118
  • 122
  • Are you sure that it will be slower than generating signedUrl and redirecting to it? Slower in regards to which thing can you specify? – Saransh Mohapatra Jul 16 '13 at 23:50
  • Ok Thanks for your answer. And I will benchmark the results and get back to you. So that this can useful for others. And your answer is the most descriptive I found so thanks again. – Saransh Mohapatra Jul 17 '13 at 19:08
  • I slightly reorganized my answer by merging my last update (about 302 redirect) into it. – Leonid Beschastny Jul 17 '13 at 19:30
  • @SaranshMohapatra It looks like I was wrong about caching in your case. See my update. – Leonid Beschastny Jul 18 '13 at 14:25
  • Thanks for coming up with this....really appreciate it. And about your suggestion as to what most social networks do...Just take a look at dropbox. Suppose you open a photo and some how get its original url still another dropbox user can't open it. That's the kind of solution I am looking for. – Saransh Mohapatra Jul 18 '13 at 15:19
  • I had to give the preferred answer to Jonathon Org. Because thats the solution I eventually had to go and do. BUt still really appreciate your answer. – Saransh Mohapatra Jul 21 '13 at 06:48
1

I would be concerned with using the CloudFront option if the photos really do need to remain private. It seems like you'll have a lot more flexibility in administering your own security policy. I think the nginx setup may be more complex than is necessary. Express should give you very good performance working as a remote proxy where it uses request to fetch items from S3 and streams them through to authorized users. I would highly recommend taking a look at Asset Rack, which uses hash signatures to enable permanent caching in the browser. You won't be able to use the default Racks because you need to calculate the MD5 of each file (perhaps on upload?) which you can't do when it's streaming. But depending on your application, it could save you a lot of effort for browsers never to need to refetch the images.

Dan Kohn
  • 33,811
  • 9
  • 84
  • 100
  • What If i dont stream the photos rather wait for the file to be downloaded into a buffer and than give it back to the user? – Saransh Mohapatra Jul 08 '13 at 05:55
  • 1
    When you receive the file, fingerprint it using the same MD5 hash that asset-rack uses: https://github.com/techpines/asset-rack/blob/master/lib/asset.coffee#L200 . Upload to S3 using that filename with knox. Then, when a user requests the file, use everyauth or passport to authenticate, and if successful, use knox to fetch the file from S3 and serve it to the user, with cache-control set to 1 year caching, the way asset rack does. – Dan Kohn Jul 15 '13 at 03:46
  • Yeah...Your Idea looks great but why should I knox, whereas I can use AWS-SDK for it. – Saransh Mohapatra Jul 15 '13 at 06:22
  • Either should work fine, but knox is actually much more widely deployed, and focuses just on file uploads and downloads, as opposed to all other Amazon services. If you want to get fancy, you could also consider having Node.js prepare a manifest, and have the client actually upload directly to S3. http://bencoe.tumblr.com/post/30685403088/browser-side-amazon-s3-uploads-using-cors – Dan Kohn Jul 15 '13 at 10:45
  • No I wouldn't want that....as I have certain manipulations to do on the file before allowing it to be uploaded. Thanks for your answer. – Saransh Mohapatra Jul 16 '13 at 06:13
0

Regarding your second option, you should be able to set cache control headers directly in S3.

Regarding your first option. Have you considered securing your images a different way? When you store an image in S3, couldn't you use a hashed and randomised filename? It would be quite straight forward to make the filename difficult to guess + this way you'll have no performance issues viewing the images back.

This is the technique facebook use. You can still view an image when you're logged out, as long as you know the URL.

Rob Squires
  • 2,108
  • 16
  • 15
  • I don't want that anyone with the url to be able to view the image. Another question about setting cache control headers directly in S3 is that every time the user wants to view he requests it to the app and I generate a url and send him, so basically the cache control wouldn't matter much. – Saransh Mohapatra Jul 13 '13 at 08:06
  • understood, I've posted another option – Rob Squires Jul 13 '13 at 08:42
  • Where have you posted another option? If you mean the facebook technique than I have already let you know I don't anyone with the url to be able to view the images, if he doesn't have the rights to. – Saransh Mohapatra Jul 13 '13 at 14:35