2

This question is a re-opening of RewriteRule - Caret ^ - Match because the actual question has not been answered by the accepted answer.

I am confused about these rewrite rules in a .htaccess file which is supposed to redirect all requests for non-existing files and directories to the front controller

# Send Requests To Front Controller...
RewriteCond %{REQUEST_FILENAME} !-d
RewriteCond %{REQUEST_FILENAME} !-f
RewriteRule ^ index.php [L]

The original question was: How is it possible that a caret can match the whole URL if it is a position anchor?

Please note, the highlighted part "whole". I know that the caret matches the beginning of a line and thus the rule is always hit, but the caret does not consume any characters and according to the official Apache docs, "the Substitution of a rewrite rule is the string that replaces the original URL-path that was matched by Pattern."

Moreover, if the .htaccess file is placed in a directory, the leading / of the URL which represents the directory, is not part of the match (again, see Apache docs, 2nd bullet point below "What is matched").

In summary, if the URL is something like https://my-domain.tld/api/foo the relative URL seen by the rewrite rule is api/foo, the caret ^ matches the beginning and after substitution we end up with index.phpapi/foo. Essentially, index.php is put in front of the original URL.

How does this work? A file named index.phpapi/foo does not exist. I would have expected a 404 Not Found result code.

user2690527
  • 1,729
  • 1
  • 22
  • 38

1 Answers1

2
# Send Requests To Front Controller...
RewriteCond %{REQUEST_FILENAME} !-d
RewriteCond %{REQUEST_FILENAME} !-f
RewriteRule ^ index.php [L]

As you state, the caret (^) does not actually match anything. It simply asserts the start-of-string, which will be successful for everything (every requested URL-path). And that is all it needs to do here... be successful.

The regex $ (end-of-string anchor) would have the same result. As would .? (an optional single character), etc.

but the caret does not consume any characters and according to the official Apache docs, "the Substitution of a rewrite rule is the string that replaces the original URL-path that was matched by Pattern."

It doesn't literally just replace the part that is "matched". It replaces the entire URL-path that is matched by (or satisfies) the pattern. The pattern ^ successfully matches (or satisfies) everything.

Moreover, if the .htaccess file is placed in a directory, the leading / of the URL which represents the directory, is not part of the match (again, see Apache docs, 2nd bullet point below "What is matched").

Yes. (Although, strictly speaking, it is the directory-prefix that is removed. And the directory-prefix always ends in a slash. The directory-prefix is the absolute filesystem path of the location of the .htaccess file.)

and after substitution we end up with index.phpapi/foo. Essentially, index.php is put in front of the original URL.

No, that is not what happens.

As noted above, the substitution string replaces the entire URL-path on success. index.php replaces api/foo in its entirety. api/foo successfully matched (or satisfied) the regex ^.

If you literally wanted to replace just the part of the URL-path that is matched by (part of) the RewriteRule pattern then you would need to manually reconstruct the entire URL-path by capturing the other parts of the URL-path. (This is a common task when you want to replace just a single word in the requested URL-path.)

end up with index.phpapi/foo

To do that you would indeed need to match everything, capturing a backreference and constructing the URL-path. For example:

:
RewriteRule (.*) index.php$1 [L]

But as you say, this will likely result in a 404.


Aside:

Strictly speaking, the ^ is not optimal here. This is successful for the directory itself (an empty URL-path). However, the first condition (RewriteCond directive) excludes directories, so the rule is not successful anyway. The pattern does not need to be successful for literally everything, just everything other than the directory itself. For example, the following would be an improvement (ie. fail early):

RewriteCond %{REQUEST_FILENAME} !-d
RewriteCond %{REQUEST_FILENAME} !-f
RewriteRule . index.php [L]

This "matches" just a single character. It does nothing with this match, it is simply "successful". It fails to match the directory itself (but the first condition would also cause the rule to fail).

This rule does not need to rewrite the directory to index.php because mod_dir issues a subrequest for index.php (the DirectoryIndex) when the directory itself is requested.

MrWhite
  • 43,179
  • 8
  • 60
  • 84
  • Thanks for the good explanation. This part summarizes it: "It doesn't literally just replace the part that is "matched". It replaces the entire URL-path that is matched by (or satisfies) the pattern". IMHO the Apache Docs are very imprecise and unclear in this point. In particular, as the behavior is counter-intuitive and in contrast to how other regex replacements work. (For example, if I do a search-and-replace in my standard text editor and replace `^` by `foo`, then `foo` is put in front of every line.) IMHO, the Apache docs should have stressed this fact and explicitly pointed that out. – user2690527 Apr 13 '22 at 17:50
  • @user2690527 "...in contrast to how other regex replacements work" - to be clear, this is not a "regex replacement". The `RewriteRule` _pattern_ is essentially just an expression that determines whether the rule should be processed or not. You could, for instance, have a "negated" _pattern_ that is successful when the _pattern_ does not match. The [mod_rewrite introduction](https://httpd.apache.org/docs/current/rewrite/intro.html#rewriterule) may provide a better overview for some of these points. But yes, the Apache docs can be a little unclear in places. – MrWhite Apr 13 '22 at 23:11