17

I'm working on a website running on a shared Apache v2.2 server, so all configuration is via .htaccess files, and I wanted to use mod_rewrite to map URLs to the filesystem in less-than-completely-straightforward way. Just for example's sake, let's say that what I wanted to do was this:

  • Map URL www.mysite.com/Alice to filesystem folder /public_html/Bob
  • Map URL www.mysite.com/Bob to filesystem folder /public_html/Alice

Now, after several hours work carefully designing the ruleset (the real one, not the Alice/Bob one!) I put all my carefully crafted rewriting rules in a .htaccess file in /public_html, and tested it out ...only to get a 500 server error!

I'd been caught out by a well documented "gotcha!" in Apache: When mod_rewrite rules are used inside a .htaccess file, a re-written URL is re-submitted for another round of processing (as if it were an external request). That happens so that any rules in the target folder of the re-written request can be applied, but it can result in some very counter-intuitive behaviour by the webserver!

In the above example, that means that a request for www.mysite.com/Alice/foo.html gets rewritten to /Bob/foo.html, and then resubmitted (internally) to the server as a request for www.mysite.com/Bob/foo.html. This is then re-rewritten back to /Alice/foo.html and resubmitted, which causes it to get re-re-rewritten to /Bob/foo.html, and so on; an infinite loop ensues... broken only by a server timeout error.


The question is, how to ensure that a .htaccess mod_rewrite ruleset only gets applied ONCE?


The [L] flag in a RewriteRule stops all further rewriting during a single pass through the ruleset, but doesn't stop the entire ruleset from being re-applied after the re-written URL is resubmitted to the server. According to the documentation, Apache v2.3.9+ (currently in Beta) contains an [END] flag that provides precisely this functionality. Unfortunately, the web host is still using Apache 2.2, and they declined my polite request to upgrade to the beta version!

What's needed is a workaround that provides similar functionality to the [END] flag. My first thought was that I could use an environment variable: Set a flag during the first rewriting pass that would tell subsequent passes to do no further rewriting. If I called my flag variable 'END', the code might look like this:

#  Prevent further rewriting if 'END' is flagged
RewriteCond %{ENV:END} =1
RewriteRule .* - [L]

#  Map /Alice to /Bob, and /Bob to /Alice, and flag 'END' when done
RewriteRule ^Alice(/.*)?$ Bob$1 [L,E=END:1]
RewriteRule ^Bob(/.*)?$ Alice$1 [L,E=END:1]

Unforunately this code doesn't work: After a bit of experimentation, I discovered that environment variables don't survive the process of re-submitting the rewritten URL to the server. The last line on this Apache documentation page suggests that environment variables ought to survive internal redirects, but I found that not to be the case.

[EDIT: On some servers, it does work. If so, it's a better solution than what follows below. You'll have to try it for yourself on your own server to see.]

Still, the general idea can be salvaged. After many hours of hair-pulling, and some advice from a colleague, I realised that HTTP request headers are preserved across internal redirects, so if I could store my flag in one of those, it might work!


Here's my solution:


# This header flags that there's no more rewriting to be done.
# It's a kludge until use of the END flag becomes possible in Apache v2.3.9+
# ######## REMOVE this directive for Apache 2.3.9+, and change all [...,L,E=END:1]
# ######## to just [...,END] in all the rules below!

RequestHeader set SPECIAL-HEADER-STOP-FURTHER-REWRITES-kjhsdf87653vasj 1 env=END


# If our special end-of-rewriting header is set this rule blocks all further rewrites.
# ######## REMOVE this directive for Apache 2.3.9+, and change all [...,L,E=END:1]
# ######## to just [...,END] in all the rules below!

RewriteCond %{HTTP:SPECIAL-HEADER-STOP-FURTHER-REWRITES-kjhsdf87653vasj} =1 [NV]
RewriteRule .* - [L]


#  Map /Alice to /Bob, and /Bob to /Alice, and flag 'END' when done

RewriteRule ^Alice(/.*)?$ Bob$1 [L,E=END:1]
RewriteRule ^Bob(/.*)?$ Alice$1 [L,E=END:1]

...and, it worked! Here's why: Inside a .htaccess file, directives associated with various apache modules execute in the module order defined in the main Apache configuration (or, that's my understanding, anyway...). In this case (and critically for the success of this solution) mod_headers was set to execute after mod_rewrite, so the RequestHeader directive gets executed after the rewrite rules. That means the the SPECIAL-HEADER-STOP-FURTHER-REWRITES-kjhsdf87653vasj header gets added to the HTTP request iff a RewriteRule with [E=END:1] in its flag list gets matched. On the next pass (after the re-written request is resubmitted to the server) the first RewriteRule detects this header, and aborts any further rewriting.

Some things to note about this solution are:

  1. It won't work if Apache is configured to run mod_headers before mod_rewrite. (I'm not sure if that's even possible, or if so, how unusual it'd be).

  2. If an external user includes a SPECIAL-HEADER-STOP-FURTHER-REWRITES-kjhsdf87653vasj header in their HTTP request to the server, it'll disable all URL rewriting rules, and that user will see the filesystem directory structure "as-is". That's the reason for the random string of ascii characters at the end of the header name - it's to make the header hard to guess. Whether this is a feature or a security vulnerability depends on your point of view!

  3. The idea here was a workaround to mimic the use of the [END] flag in Apache versions that don't yet have it. If all you wanted was to ensure your ruleset only runs once, regardless of which rules are triggered, then you could probably drop the use of the 'END' environment variable and just do this:

    RewriteCond %{HTTP:SPECIAL-HEADER-STOP-FURTHER-REWRITES-kjhsdf87653vasj} =1 [NV]
    RewriteRule .* - [L]
    
    RequestHeader set SPECIAL-HEADER-STOP-FURTHER-REWRITES-kjhsdf87653vasj 1
    
    #  Map /Alice to /Bob, and /Bob to /Alice
    RewriteRule ^Alice(/.*)?$ Bob$1 [L]
    RewriteRule ^Bob(/.*)?$ Alice$1 [L]
    

    Or even better, this (though the REDIRECT_* variables are poorly documented in the Apache v2.2 documetation - they seem to be only mentioned here) - so I can't guarantee it'd work on all versions of Apache):

    RewriteCond %{ENV:REDIRECT_STATUS} !^$
    RewriteRule .* - [L]. 
    
    #  Map /Alice to /Bob, and /Bob to /Alice
    RewriteRule ^Alice(/.*)?$ Bob$1 [L]
    RewriteRule ^Bob(/.*)?$ Alice$1 [L]
    

    However, once you're running Apache v2.3.9+, I expect that using the [END] flag would be more efficient than the above solution, because (presumably) it altogether avoids the rewritten URL being re-submitted to the server for another rewriting pass.

    Note that you may also want to block rewriting of subrequests, in which case you can a RewriteCond to the don't-do-any-more-rewriting rule, like this:

    RewriteCond %{ENV:REDIRECT_STATUS} !^$ [OR]
    RewriteCond %{IS_SUBREQ} =true
    RewriteRule .* - [L]
    
  4. The idea here was a workaround to replace the use of the [END] flag in Apache versions that don't yet have it. But in fact you can use this general approach to store more than just a single flag - you could store arbitrary strings or numbers that would persist across an internal server redirect, and design your rewrite rules to depend on them based on any of the test conditions RuleCond provides. (I can't, off the top of my head, think of a reason why you'd want to do that... but hey, the more flexibility and control you have, the better, right?)


I guess anyone who's read this far has figured out that I'm not really asking a question here. It's more a matter of my having found my own solution to a problem I had, and wanting to post it up here for reference in case anyone else has run into the same problem. That's a big part of what this webiste is for, right?

...

But since this is supposed to be a question-and-answer forum, I'll ask:

  • Can anyone see any potential problems with this solution (other than those I've already mentioned)?
  • Or does anyone have a better way of achieving the same thing?
Doin
  • 7,545
  • 4
  • 35
  • 37
  • 5
    [Answering](http://meta.stackexchange.com/q/17463/133817) your own question is fine (you can even [accept](http://blog.stackoverflow.com/2009/01/accept-your-own-answers/) it, though without gaining reputation, and potentially earn a [badge](http://stackoverflow.com/badges/14/self-learner) for it); however, it should be posted as an answer, not within the question body. Please edit your question and post the answer separately. – outis Apr 20 '12 at 05:46

3 Answers3

7

Depending on your Apache build, this condition may work (add it to "stop-rewriting" rule: i.e. RewriteRule .* - [L] .. or just for specific problematic rule):

RewriteCond %{ENV:REDIRECT_STATUS} ^$

REDIRECT_STATUS will be empty of very first / initial rewrite and will have value of 200 (or maybe other value as well -- have not checked that deep) on any subsequent cycle.

Unfortunately it works on some systems and does not on others and I personally have no idea what is responsible for making it working.

Other than this the most common thing is to add rewrite condition to check the original URL, for example by parsing %{THE_REQUEST} variable e.g. RewriteCond %{THE_REQUEST} ^[A-Z]+\s.+\.php\sHTTP/.+ -- but this only makes sense for individual problematic rules.

In general -- you should avoid such "rewrite A -> B and then B -> A" situations (I'm pretty sure you are aware of that).

As for your own solution -- "don't fix if it ain't broken" -- if it works then it's great as I do not see any major problems with such approach.

LazyOne
  • 158,824
  • 45
  • 388
  • 391
  • Yes, the very first thing I tried was looking for some kind of predefined Apache variable that would tell me whether or not I was in an internal redirect or not. It's certainly information Apache _ought_ to be providing, and if you _can_ access that information in a RuleCond, that's definitely better than using my solution above. But after redirecting to a PHP script that prints all the environment variables, I realized that none of them were indicating an internal redirect had occurred; no REDIRECT_* variables were defined. As you say, it all seems to depend on the Apache configuration. – Doin Oct 18 '11 at 11:14
  • Oh, and yeah - of course the rewrite A->B and then B->A thing was just an example. You'd not normally do that, exactly. More realistically you might (for example) want to map a certain group of files _out_ of the root URL space to a filesystem subdirectory, and map another (different) group of files from a URL subdirectory to the filesystem web_root, and then similar problems can occur. – Doin Oct 18 '11 at 11:20
  • @Doin Yes, it is indeed supposed to be `!^$` in this example. I'm just using other way around as I rarely have such loop situations. As for `REDIRECT_*` variables -- look for them under `$_SERVER[]`. I definitely see them in 404 script when redirecting to non-existing resource by mistake (I just do not currently have Apache next to me to verify) – LazyOne Oct 18 '11 at 13:08
  • I did look in $_SERVER. They weren't there, or I'd have used them and saved myself hours and hours of frustration! I'm guessing it's possible to configure Apache not to create them? – Doin Oct 20 '11 at 01:54
  • Well, I looked again, and there actually _are_ some REDIRECT_* variables: REDIRECT_STATUS is defined and is "200" **regardless** of whether I've redirected or not. If I have redirected, I also get REDIRECT_END (= "1") and REDIRECT_URL. However, REDIRECT_URL is the same as REQUEST_URI, and both contain the original (non-redirected) URL. That's why I never noticed I could use it before. – Doin Oct 20 '11 at 07:40
  • I looked in the Apache documentation, and the REDIRECT_* variables aren't mentioned, except (very unhelpfully) on the [custom error responses](http://httpd.apache.org/docs/2.2/custom-error.html) page, which leads me to think that they're not a fully-supported feature, or may change between minor Apache version numbers. I'm not sure, but I'd be reluctant to depend on them staying the same when changing servers, or after upgrading Apache. – Doin Oct 20 '11 at 07:49
  • Still, if they do work as I've described, you could ensure that your mod_rewrite ruleset only gets applied once by including the following 2 lines at the top: `RewriteCond %{ENV:REDIRECT_URL} !^$`, `RewriteRule .* - [L]`. It's probably a better solution then defining custom headers as I've done. – Doin Oct 20 '11 at 07:55
  • Well I tested the `REDIRECT_URL` version in my actual .htaccess file, and it didn't work. But the `REDIRECT_STATUS` one does - evidently this variable gets defined _after_ the mod_rewrite ruleset executes the first time - so I've edited the post again to reflect this. – Doin Oct 20 '11 at 08:46
  • I need to change public link but with the same old file names. I was looking for a solution from more than an hour and this is the solution i was looking for. I redirect old link to new as Permanent redirects with this condition, after that I rewrite new link to old and finally I apply a final pattern (in same run). – Loenix Dec 04 '13 at 08:30
2

I'm not too sure why you need to do that, but I'd suggest a couple of things for users who run in such a situation:

  1. How about renaming the folder Bob into Alice and vice versa? Then Apache doesn't need to do anything about them.

  2. If that's important for your application, could you just transform the app. to detect Bob and Alice and just swap those in your app. instead?

In PHP it would be something like this:

if($path == "Bob") {
  $path = "Alice";
}
else if($path == "Alice") {
  $path = "Bob";
}

Done.

Otherwise adding another sub-folder could be useful. So /Bob becomes /a/Alice and /Alice becomes /b/Bob. Then you remove the confusion. That could also be done with another parameter (query string), which is more or less what you're doing by setting an environment variable that you test in your .htaccess.

Alexis Wilke
  • 19,179
  • 10
  • 84
  • 156
0

Variables set by RewriteRule (that one that modified path) are available on "next round" ("internal redirect") with prefix REDIRECT_ prepanded. So your first code snippet should look this way:

RewriteCond %{ENV:REDIRECT_END} =1
RewriteRule .* - [L]

This works for me with apache 2.4.10.

Sergio
  • 1
  • It's a moot point on Apache 2.4 and later - you're better off just using the [END] flag. The trick with environment variables was necessary for versions prior to 2.3.9 only. – Doin Feb 27 '16 at 13:27
  • Also, the prepending of REDIRECT_ to the environment variables seems to be poorly documented in the official docs, though I did find it mentioned here: https://httpd.apache.org/docs/2.4/custom-error.html for error redirects specifically. According to this page: https://httpd.apache.org/docs/2.4/rewrite/advanced.html the way my code snippet does it is in fact correct (scroll to the bottom of the page - they do the same thing, essentially). – Doin Feb 27 '16 at 13:35
  • For more information on the prepending of REDIRECT_ to environment variables in Apache, see this stackoverflow question: https://stackoverflow.com/questions/3050444/when-setting-environment-variables-in-apache-rewriterule-directives-what-causes – Doin Feb 27 '16 at 13:39