35

I am experimenting with AWS S3 and CloudFront for a web application that I am developing.

In the app I'm letting users upload files to the S3 bucket (using the AWS SDK) and make it available via CloudFront CDN, but the issue is even when the files are uploaded and ready in the S3 bucket it takes about a minute or 2 to be available in the CloudFront CDN url, is this normal?

MatthewMartin
  • 32,326
  • 33
  • 105
  • 164
Ahsan
  • 2,488
  • 2
  • 22
  • 44
  • Yes, it is. It takes a couple of minutes to propagate the content across edge locations – Khalid T. Feb 21 '16 at 09:16
  • That's not the idea behind CDNs. If your application is intolerant in terms of caching and expiration, then you'd be better off with S3 and leave CloudFront for static content only. – Khalid T. Feb 21 '16 at 10:14
  • You could generate a dummy request (from your code) for the file as soon as you finish uploading it to force the CloudFront distribution to get it from the origin immediately. – Khalid T. Feb 21 '16 at 10:39
  • 13
    @KhalidT. your description of the way CloudFront works is incorrect. New files in S3 do not get propagated to every edge location when they are created. Each edge location will fetch a file and add it to the edge location's cache the first time a file is requested. I recommend you read this page to understand how CloudFront (and most other CDNs for that matter) work: http://docs.aws.amazon.com/AmazonCloudFront/latest/DeveloperGuide/HowCloudFrontWorks.html – Mark B Feb 21 '16 at 16:35
  • 8
    Is it possible that you are trying to request the file from CloudFront *before* it is uploaded to S3? Or before the upload is complete? That's going to poison the cache for a couple of minutes, because the non-existence of the object will briefly be cached by you making that premature request. CloudFront is *absolutely* suited to real-time operations, there's only one reason for the behavior you describe. Check the response headers for an `Age:` which tells you how long a cached response has been cached. – Michael - sqlbot Feb 21 '16 at 16:59
  • @Michael-sqlbot positive that the files exist in the S3 bucket and also made sure that its made visible publicly. – Ahsan Feb 21 '16 at 17:05
  • 1
    Right, but I'm asking if you tried to download *before* you tried to upload, before the file was there, because if you did, that would cause what you see. Do the error responses include an `Age:` header? – Michael - sqlbot Feb 21 '16 at 17:14
  • 5
    its google chrome! http://chrome.blogspot.com.au/2012/01/speed-and-security.html it loads it in the background before i hit enter.. i actually typed the url and waited for the file to complete the upload (on another tab). you are so right! – Ahsan Feb 21 '16 at 17:20

3 Answers3

55

CloudFront attempts to fetch uncached content from the origin server in real time. There is no "replication delay" or similar issue because CloudFront is a pull-through CDN. Each CloudFront edge location knows only about your site's existence and configuration; it doesn't know about your content until it receives requests for it. When that happens, the CloudFront edge fetches the requested content from the origin server, and caches it as appropriate, for serving subsequent requests.

The issue that's occurring here is related to a concept sometimes called "negative caching" -- caching the fact that a request won't work -- which is typically done to avoid hammering the origin of whatever's being cached with requests that are likely to fail anyway.

By default, when your origin returns an HTTP 4xx or 5xx status code, CloudFront caches these error responses for five minutes and then submits the next request for the object to your origin to see whether the problem that caused the error has been resolved and the requested object is now available.

— http://docs.aws.amazon.com/AmazonCloudFront/latest/DeveloperGuide/custom-error-pages.html

If the browser, or anything else, tries to download the file from that particular CloudFront edge before the upload into S3 is complete, S3 will return an error, and CloudFront -- at that edge location -- will cache that error and remember, for the next 5 minutes, not to bother trying again.

Not to worry, though -- this timer is configurable, so if the browser is doing this under the hood and outside your control, you should still be able to fix it.

You can specify the error-caching duration—the Error Caching Minimum TTL—for each 4xx and 5xx status code that CloudFront caches. For a procedure, see Configuring Error Response Behavior.

— http://docs.aws.amazon.com/AmazonCloudFront/latest/DeveloperGuide/custom-error-pages.html


To configure this in the console:

  • When viewing the distribution configuration, click the Error Pages tab.

  • For each error where you want to customize the timing, begin by clicking Create Custom Error Response.

  • Choose the error code you want to modify from the drop-down list, such as 403 (Forbidden) or 404 (Not Found) -- your bucket configuration determines which code S3 returns for missing objects, so if you aren't sure, change 403 then repeat the process and change 404.

  • Set Error Caching Minimum TTL (seconds) to 0

  • Leave Customize Error Response set to No (If set to Yes, this option enables custom response content on errors, which is not what you want. Activating this option is outside the scope of this question.)

  • Click Create. This takes you back to the previous view, where you'll see Error Caching Minimum TTL for the code you just defined.

Repeat these steps for each HTTP response code you want to change from the default behavior (which is the 300 second hold time, discussed above).

When you've made all the changes you want, return to the main CloudFront console screen where the distributions are listed. Wait for the distribution state to change from In Progress to Deployed (formerly, this took quite some time but now requires typically about 5 minutes for the changes to be pushed out to all the edges) and test.

Michael - sqlbot
  • 169,571
  • 25
  • 353
  • 427
3

Are these new files being written to S3 for the first time, or are they updates to existing files? S3 provides read-after-write consistency for new objects, and given CloudFront's pull model you should not be having this issue with new files written to S3. If you are, then I would open a ticket with AWS.

If these are updates to existing files, then you have both S3 eventual consistency and CloudFront cache expiration to deal with. Both of which could cause this sort of behavior.

Mark B
  • 183,023
  • 24
  • 297
  • 295
  • These are new files. i am also confused why it would take one to two minutes for it to pull fresh new files as when the cdn is asked for a url which it doesn't know about it should certainly query the origin for its existence. i will open a ticket as you suggested. – Ahsan Feb 21 '16 at 16:49
0

As observed in your comment, it seems that google chrome is messing up with your upload/preview strategy:

  1. Chrome is requesting the URL that currently doesn't have the content.
  2. the request is cached by cloudfront with invalid response
  3. you upload the file to S3
  4. when preview the uploaded file the cloudfront answers with the cached response (step 2).
  5. after the cloudfront cache expires, cloudfront hits origin and the problem can no longer be reproducible.
Alessandro Oliveira
  • 2,126
  • 2
  • 17
  • 24