Given an Apache 2.x web server that uses Content Negotiation (+MultiViews
) to access URLs without their extensions (e.g., allow /foo
vs. /foo.php
or /foo.html
), how does one issue a 301 permanent redirect when someone does in fact try to use those extensions?
The goal is for all roads to lead to the sans extension version of the URL, so /foo/
goes to /foo
, and /foo.html
goes to /foo
. It's the latter one that is proving tricky. (Use case: There are scattered legacy URLs out on the internets that still use the extension. We want those to be permanently redirected.)
There is the canonical link element but, even in the accompanying slides, the suggestion is it's better to do the redirect server-side in the first place.
I've been trying this with mod_rewrite, but it seems like Apache "beats me to it" as it were. It's as if the extension is simply ignored. "No need, I've got it covered" says Apache. But then you can't handle the permanent redirect, and thus extension and no-extension variants are both allowed. Not the desired result. :)
Here's one example. Given a 2-4 character filename consisting of lower-case letters, and a test file placed in /foo/file.html
, we want to permanently redirect to /foo/file
:
Options +MultiViews
RewriteEngine on
...
RewriteRule ^foo/([a-z]{2,4}).html/$ /foo/$1 [R=301,L]
/foo/file/
and /foo/file.html/
do redirect to /foo/file
, but of course /foo/file.html
does not. If we try a rule like the following (note the lack of a trailing slash before $):
RewriteRule ^foo/([a-z]{2,4}).html$ /foo/$1 [R=301,L]
... we end up with too many redirects because Apache acts as if the rule is the following, and so it ends up chasing its own tail:
RewriteRule ^foo/([a-z]{2,4})$ /foo/$1 [R=301,L]
In an attempt to be too clever for my own good, I also tried nested parentheses:
RewriteRule ^foo/(([a-z]{2,4}).html)$ /foo/$2 [R=301,L]
No dice. Redirect loop city.
What would be really good is to capture this sort of thing "en masse" so I don't have all these special cases floating around in htaccess.
Another SO question began to address this for the single case of handling html files, but the proposed solution ostensibly requires disabling of Content Negotiation, which isn't good if you still want to use it for images and other file extensions (as it is in my case).
Extra credit: We also want to avoid trailing slashes, so if someone tries /foo/
(which could itself be a .html or .php file) it goes to /foo
no matter what. The first rule (above) accomplishes this, but I think it's due to +MultiViews. I have my doubts about using DirectorySlash
here, as there may be some risk there that makes it not as worthwhile.