6

So I'm using a very simple CDN service. You point to your website and if you call it through their HostName they'll cache it for you after the first call.

I use this for all my static content, like JavaScript files and images.

This all works perfect - and I like that it has very little maintenance or setup cost.

Problem starts when rolling out new versions of JavaScript files. New JavaScript files automatically get a new hash if the files changes.

Because roll out over multiple instances is not simultaneously a problem occurs though. I tried to model it in this diagram:

Diagram

In words:

  • Request hits server with new version
  • Requests Js file with new version hash
  • CDN detects correctly that the file is not cached
  • CDN requests the original file with the new hash from the load balancer
  • loadbalancer serves request of CDN to a random server - accidently serving from a server with the old version
  • CDN caches old version with the new hash
  • everyone gets served old versions from the CDN

There are some ways I know how to fix this - i.e. manually uploading files to a seperate storage with the hash baked in, etc. But this needs extra code and has more "moving parts" that makes maintenance more complicated.

I would prefer to have something that works as seamlessly as the normal CDN behavior. I guess this is a common problem for sites that are running on multiple instances, but I can't find a lot of information about this.

What is the common way to solve this?

Edit

I think another solution would be to somehow force the CDN to go to the same instance for the .js file as the original html file - but how?

Dirk Boer
  • 8,522
  • 13
  • 63
  • 111

3 Answers3

1

Here are a few ideas from my solutions in the past, though the CDN you are using will rule out some of these:

  1. Exclude .js files from the CDN Caching Service, prevents it being cached in the first place.
  2. Poke the CDN with a request to invalidate the cache for a specific file at the time of release.
  3. In your build/deploy script, change the name of the .js file and reference the new file in your HTML.
  4. Use query parameters after the .js file name, which are ignored but cached under a different address reference, e.g. /mysite/myscript.js?build1234
Matt D
  • 3,289
  • 1
  • 15
  • 29
0

The problem with this kind of issues is that the cache control resides on the browser side, so you cannot do too much form the server side.

The most common way I know is basically the one you mention about adding some hash to the file names or the URLs you use to get them.

The thing is that you should not do this manually. You should use some web application builder, like Webpack, to automate this process and it will depend on the technologies you are using. I saw this for the first time using GWT 13 years ago, and all the last projects I worked with, using AngularJS or React, had been integrated with builders that does what you need automatically.

Once it's implemented, your users will get the last version, and resources will be cached correctly to speed up your site.

If you can also automate the full pipeline to remove the old resources from the CDN once the expiration configured on them have been reached, you touched the sky.

Jorge Garcia
  • 1,313
  • 8
  • 14
  • Hi @Jorge - thanks for your answer! Everything is working perfectly in a single instance environment. Everything is already automated. It goes wrong in the multi-instance environment. Web pack wouldn't solve that problem. I think I wrote my question not clear enough. Thanks for your time for answering. – Dirk Boer Jun 22 '20 at 21:41
  • 1
    Hi @DirkBoer. What I meant was that the output from webpack should be js files like "main.build12345.js" and an "index.html" file pointing those. The client will hit the CDN requesting "main.build12345.js" and there is no way it goes and get a filename called "main.build12344.js" instead. The only thing that can be cached wrong is "index.html". The rest will be all new files with new names. This approach will require cleaning the CDN from old deployments every some time. Did I miss anything? – Jorge Garcia Jun 26 '20 at 04:11
  • 1
    Good point actually! I'm so used to putting it into the hash - but really changing the file would actually make a difference here. I still have the problem that if I use real filenames, that the CDN could actually request it from a server that hasn't got the new version yet - resulting in 404 for the script file. – Dirk Boer Jun 29 '20 at 07:11
0

I fixed this in the end by only referencing to the CDN version after a few minutes of runtime.

So if the runtime is less then 5 minutes it refers to:

/scripts/example.js?v=351

After 5 minutes it refers to the CDN version:

https://cdn.example.com/scripts/example.js?v=351

After 5 minutes we are pretty sure that all instances are running the new version, so that we don't accidently cache an old version with the new hash.

The downside is that on very busy moments you don't have the advantage of the CDN if you would redeploy, but I haven't seen a better alternative yet.

Dirk Boer
  • 8,522
  • 13
  • 63
  • 111