3

I've just set up static page caching using Zend_Cache_Backend_Static to serve cached html files in my application, which is working great. The only concern I have is down to the way it caches files with $_GET parameters. Because it automatically creates a folder structure which maps to the supplied URL route, is this a potential security risk in cases where large numbers of $_GET parameters may be deliberately appended to existing pages? Either hitting a maximum directory depth or a maximum file length?

For example: At the moment I'm caching my pages into /public/cache/static/ so using the standard router /module/controller/action/param1/val1/param2/val2 or standard query string /module/controller/action?param1=val1&param2=val2 would create the following directory structures:

/public/cache/static/module/controller/action/param1/val1/param2/val2.html 
/public/cache/static/module/controller/action?param1=val1&param2=val2.html

Allowing people access to creating a directory structure in this way (however limited) worries me slightly. Both Zend_Cache_Backend_Static and the corresponding Zend_Cache_Frontend_Capture must both be set in the ini file not via Zend_Cache factory and don't appear to have any setup options.

Could it just be a case of replacing the default router with custom routes that limit the number of $_GET variables? Is this possible or would I need to specify exactly the variables I needed for each route (not the end of the world but a bit more limiting)

Update:

So the existing rewrite rule to handle the static cache is as follows:

RewriteCond %{REQUEST_METHOD} GET
RewriteCond %{DOCUMENT_ROOT}/cached/index.html -f
RewriteRule ^/*$ cached/index.html [L]

RewriteCond %{REQUEST_METHOD} GET
RewriteCond %{DOCUMENT_ROOT}/cached/%{REQUEST_URI}\.html -f
RewriteRule .* cached/%{REQUEST_URI}\.html [L]

RewriteCond %{REQUEST_FILENAME} -s [OR]
RewriteCond %{REQUEST_FILENAME} -l [OR]
RewriteCond %{REQUEST_FILENAME} -d
RewriteRule ^.*$ - [NC,L]

RewriteRule ^.*$ index.php [NC,L]

If the request hits a page in the static cache it will send that html page. If not it will hit Zend Framework and generate it.

I could add the following to the start:

RewriteCond %{QUERY_STRING} \S
RewriteRule [^\?]+ /$0? [R=301,L]

Which will wipe my query string altogether. This is fine as I can still pass $_GET variables in using the URL path method of Zend Framework (which I have also limited by providing very explicit routes). But is it possible to do this without redirecting?

baseten
  • 1,362
  • 1
  • 13
  • 34

2 Answers2

1

The ideal way would be to define this as a RewriteCond, but I'm not sure it's possible to count the number of GET params using mod_rewrite.

So probably the best solution would be to redirect to a standalone php script that decides whether or not to use the cached html files.

<?php

if (count($_GET) >= 20) {
  require __DIR__ . 'index.php';
} else {
  require '/path/to/cache.html';
}
Stephen Fuhry
  • 12,624
  • 6
  • 56
  • 55
  • The idea with Zend_Cache_Backend_Static though is to bypass PHP and the Zend Framework MVC stack entirely if a cache file is available. It does this by using mod_rewrite to redirect to an existing html page. I've added more detail to my question above. – baseten Feb 20 '12 at 11:02
  • Right, I understand.. but what I'm saying is that I don't think that it's possible due to the limitations of mod_rewrite. But either way, running the above script to decide whether to use the cache will be far less expensive than loading the entire MVC stack. – Stephen Fuhry Feb 20 '12 at 13:55
  • True, but we're looking to avoid PHP and Apache altogether on html cache hits, either via CDN or reverse proxy nginx. I'm happy enough completely removing the query string and dealing with url path variables, I'd just prefer not to have to explicitly define each route. – baseten Feb 20 '12 at 18:50
  • In that case, you could do something like add a `usecache=true` `GET` param, and watch for it using something like this `RewriteCond %{QUERY_STRING} ^([^&]&)*usecache(&|$)`... I'm not sure how to avoid Apache though. – Stephen Fuhry Feb 21 '12 at 21:41
  • The above mod_rewrite already handles this caching. If a ZF formatted URL hits an existing cache file, then this is served, if not it hits PHP. When serving cache files, we can run Apache on a port other than 80, with something like nginx on 80 which serves the cache files if present or forwards to Apache if not. This is a reverse proxy and is basically making our own CDN. Our worry is really the issue with GET params. Which the below solution solves, we can be less restrictive with routes by limiting the number of slashes in the request. – baseten Feb 22 '12 at 13:31
0

OK, so the RewriteRule stripping the query string will work without a redirect.

The issue (I suspect), is that Zend_Cache_Backend_Static is using $_SERVER['REQUEST_URI'] somewhere along the line and therefore getting access to the original filename. My knowledge of mod_rewrite is pretty slim and I didn't realise that this value wasn't altered.

So, to prevent files and directories being created by massive query strings I've had to do the following things:

Firstly for standard query strings:

Strip the query string at the start of my mod_rewrite, without redirecting:

RewriteCond %{QUERY_STRING} \S
RewriteRule [^\?]+ /$0?

In my index.php I'm then changing the $_SERVER['REQUEST_URI'] to match the redirect, by stripping the query string, which means I don't need to hack ZF any longer:

$queryIndex = strpos($_SERVER['REQUEST_URI'], '?');
if($queryIndex !== false) {
    $_SERVER['REQUEST_URI'] = substr($_SERVER['REQUEST_URI'], 0, $queryIndex);
}

This will now prevent ANY query string from being interpreted by my application. To pass variables to pages I am therefore using Zend Framework url path parameters. To prevent these from creating excessively deep cache folders, I've replaced the default route with a few very explicitly defined routes in the Bootstrap:

$frontController = Zend_Controller_Front::getInstance(); 
$router = $frontController->getRouter();

$route = new Zend_Controller_Router_Route(
    ':module/:controller/:action',
    array(
        'module' => 'default',
        'controller' => 'index',
        'action' => 'index'
    )
);

$router->addRoute('default', $route);

$route = new Zend_Controller_Router_Route(
    'article/:alias',
    array(
        'module' => 'default',
        'controller' => 'article',
        'action' => 'index',
        'alias' => ''
    )
);

$router->addRoute('article', $route);

Here, I've replaced the default route so no additional parameters are allowed. Any actions which do require parameters therefore have to be explicitly set, for example in my second route. This means there could potentially be a lot of defined routes. Thankfully this is not the case in my particular application.

A way around restricting the routes so much and allowing some GET params via ZF URL paths is to set a limit on the number of slashes in the REQUEST_URI, effectively limiting the max directory depth of the static page cache (10 below). This can also be altered in index.php:

if(substr_count($_SERVER['REQUEST_URI'], '/') > 10) {
    preg_match_all("/\//", $_SERVER['REQUEST_URI'] ,$capture, PREG_OFFSET_CAPTURE);
    $_SERVER['REQUEST_URI'] = substr($_SERVER['REQUEST_URI'], 0, $capture[0][9][1]);
}
baseten
  • 1,362
  • 1
  • 13
  • 34