19

I am wondering if anyone has any plugins or capistrano recipes that will "pre-heat" the page cache for a rails app by building all of the page cached html at the time the deployment is made, or locally before deployment happens.

I have some mostly static sites that do not change much, and would run faster if the html was already written, instead of requiring one visitor to hit the site.

Rather than create this myself (seems easy but it lowwwww priority) does it already exist?

Scott Miller
  • 2,298
  • 3
  • 21
  • 24

4 Answers4

19

You could use wget or another program to spider the site. In fact, this sort of scenario is mentioned as one of the uses in its manual page:

This option tells Wget to delete every single file it downloads, after having done so. It is useful for pre-fetching popular pages through a proxy, e.g.:

   wget -r -nd --delete-after http://whatever.com/~popular/page/

The -r option is to retrieve recursively, and -nd to not create directories.

  • Yes, we have a few curl requests in our deployment script, notonly to warm the cache, but also just to get the server up and runing (e.g. first request to nginx+passenger can take 40secs or so) – tardate Oct 06 '11 at 16:32
4

I use a rake task that looks like this to refresh my page cached sitemap every night:

 require 'action_controller/integration'
 ActionController::Base::expire_page("/sitemap.xml")   
 app = ActionController::Integration::Session.new
 app.host = "notexample.com"
 app.get("/sitemap.xml")

See http://gist.github.com/122738

Marcus
  • 12,296
  • 5
  • 48
  • 66
  • I prefer this approach by using rails. For instance in general I disable wget as client for all of my pages due to rogue page scraping. – dc10 Nov 12 '16 at 16:13
  • I'm not sure which rails version introduced the change, but on rails 4.2 I need to `require 'action_dispatch/testing/integration'` and then initialize with `app = ActionDispatch::Integration::Session.new Rails.application` – andialles Mar 20 '17 at 11:08
2

I have set integration tests that confirm all of the main areas of the site are available (a few hundred pages in total). They don't do anything that changes data - just pull back the pages and forms.

I don't currently run them when I deploy my production instance, but now you mention it - it may actually be a good idea.

Another alternative would be to pull every page that appears in your sitemap (if you have one, which you probably should). It should be really easy to write a gem / rake script that does that.

RichH
  • 6,108
  • 1
  • 37
  • 61
2

Preloading this way -- generally, with a cron job to start at 10pm Pacific to and terminate at 6am Eastern time -- is a nice way to load-balance your site.

Check out the spider_test rails plugin for a simple way to do this in testing.

If you're going to use the wget above, add the --level=, --no-parent, --wait=SECONDS and --waitretry=SECONDS options to throttle your load, and you might as well log and capture the header responses for diagnosis or analysis (change the path from /tmp if desired):

wget -r --level=5 --no-parent --delete-after \
  --wait=2 --waitretry=10  \
  --server-response        \
  --append-output=/tmp/spidering-`date "+%Y%m%d"`.log
  'http://whatever.com/~popular/page/'
mrflip
  • 822
  • 6
  • 7