1

I'm trying to assemble a regular expression to rewrite a URL containing uppercase characters to the same URL but in all lowercase.

Example:

example.com/foO-BAR-bAz rewrite to example.com/foo-bar-baz

example.com/FOO-BAR-BAZ rewrite to example.com/foo-bar-baz

example.com/foo-bar-baz does not match

I tried ^\/(?=.*[A-Z]) to match a string with at least one uppercase character but it doesn't match the full string. I also know that I need to use a "capturing group" but I'm not sure how.

I would be implementing this redirect rule in an .htaccess file of an Apache server

MrWhite
  • 43,179
  • 8
  • 60
  • 84
yevg
  • 1,846
  • 9
  • 34
  • 70
  • https://www.askapache.com/htaccess/rewrite-uppercase-lowercase/#Htaccess_Redirect_Uppercase_Lowercase – noah Nov 10 '20 at 01:10
  • https://serverfault.com/questions/177647/convert-and-redirect-url-in-uppercase-to-lowercase-using-htaccess – noah Nov 10 '20 at 01:10
  • 1
    This (specifically, the replacing part) isn’t a thing that should be done in .htaccess. This would be a perfect use case for [RewriteMap](https://httpd.apache.org/docs/2.4/rewrite/rewritemap.html#int) using the internal `tolower` function, but that needs access to the server config/virtual host. If that is not an option, then I would rewrite those requests to a tiny little script (PHP, or whatever you have available), that takes part of the lower-casing, and issues an external redirect. – CBroe Nov 10 '20 at 07:43
  • 1
    And a general hint: instead of asking how to do this, have you thought about actually reading a bit about regular expressions? I mean they are 1. really well documented and 2. one of the fundamental things a programmer should know about... – arkascha Nov 10 '20 at 08:24
  • 2
    @CBroe On Apache 2.4 you can use the `tolower()` function in `.htaccess` with an Apache expression in a `RewriteCond` directive. (See [my answer](https://stackoverflow.com/a/64779809/369434) below.) – MrWhite Nov 11 '20 at 03:25

1 Answers1

6

If you are on Apache 2.4 then in .htaccess you can use mod_rewrite with an Apache expression and make use of the tolower() function. (This doesn't require the use of a RewriteMap in the server config, as mentioned in comments.) For example:

RewriteEngine On

RewriteCond expr "tolower(%{REQUEST_URI}) =~ /(.*)/"
RewriteRule [A-Z] %1 [L]

The RewriteRule pattern simply checks there is at least one uppercase letter in the requested URL-path. The RewriteCond directive then calls the tolower() function on the URL-path (REQUEST_URI server variable) which is then effectively captured using the regex. The %1 backreference in the substitution string then holds the result of the tolower() function call, ie. the lowercased URL-path, which is internally rewritten to.

To "correct" the URL and issue an external redirect, then just add the R flag to the RewriteRule directive. For example:

:
RewriteRule [A-Z] %1 [R=301,L]

UPDATE: To eliminate a double redirect when redirecting HTTP to HTTPS (and/or non-www vs www) then include the full canonical URL as part of this rule and implement the canonical (scheme + hostname) redirects second.

For example:

# 1 - Upper to lowercase conversion (and HTTPS and WWW)
RewriteCond expr "tolower(%{REQUEST_URI}) =~ /(.*)/"
RewriteRule [A-Z] https://www.example.com%1 [R=301,L]

# 2 - HTTP to HTTPS
:

Note that the %1 backreference already includes the slash prefix at the start of the URL-path, so this is omitted in the substitution string.

HOWEVER, it is not necessarily incorrect to have a double redirect in this situation. ie. Redirect HTTP to HTTPS (same hostname and URL-path) first then canonicalise other elements of the requested URL (hostname, upper/lowercase URL-path etc.) second. These should be edge cases to begin with, so the real-world impact is minimal.

Note that if you are implementing HSTS then it is a requirement that you first redirect from HTTP to HTTPS on the same hostname, before canonicalising the hostname (ie. www vs non-www). In this case you should use the HTTP_HOST server variable (ie. %{HTTP_HOST}) as the hostname in the above redirect. A double redirect cannot be avoided in this scenario.

MrWhite
  • 43,179
  • 8
  • 60
  • 84
  • 1
    @anubhava Thanks. Although I do wonder if there is a "better" method to get the result of the function to the _substitution_ string? Comparing the result to a capturing regex and using the resulting backreference in the substitution, whilst very flexible, feels a bit "hacky" when you just need the entire result? – MrWhite Nov 11 '20 at 12:42
  • 1
    @yevg How did you get on with this? – MrWhite Nov 24 '20 at 16:48
  • Here is double redirect. From https:// UPPERCASE_URL / to http:// lowercase_url and after to secure: https:// lowercase_url – theblackpost Aug 19 '22 at 13:22
  • 1
    @theblackpost If you are seeing a redirect from HTTPS to HTTP and back to HTTPS then that would seem to be an error with _your_ directives. You can eliminate a double redirect (HTTP to HTTPS _and_ upper to lowercase) by simply including the absolute URL as part of this (upper to lowercase) rule and implement the HTTP to HTTPS redirect second. I've updated my answer. – MrWhite Aug 19 '22 at 15:48