38

I have updated Apache today (to 2.4.56-1) and a load of .htaccess rewrites that used to work are now getting AH10411 errors, relating to spaces in the query. I'm struggling for a 'proper' solution.

The user clicks on a link such as <a href='FISH%20J12345.6-78919'>clickme</a> - as you can see the space in the link URL has been encoded as %20.

The .htaccess file in the relevant server directory contains and executes this relevant directive:

RewriteRule ^(FISH\s*J[0-9\.]+-?\+?[0-9]+)$ myPage.php?sourceName=$1 [L,QSA]

(In the above I am checking for spaces, not %20, as the browser seems to be converting it to space before it makes it to this rule).

This was working until I updated Apache; now users get a 403 error, and my Apache error log reports:

AH10411: Rewritten query string contains control characters or spaces

This appears to be a new error, because Googling it finds nothing!

Editing my pages to (for example) change the space to an underscore and handle it correctly is not really an option, as the design is intended to support users being able to enter a URL directly using the name of the object they care about. So far, the only workaround I've found is a bit ugly, namely capturing the two parts of the source name separately in the regexp, thus:

RewriteRule ^(FISH)\s*(J[0-9\.]+-?\+?[0-9]+)$ myPage.php?sourceName=$1+$2 [L,QSA]
                  ^   ^                                               ^^^

(I tried $1%20$2 at the end, which also resulted in the same error.)

Is there a better solution for this? i.e. how am I "supposed" to handle the case of spaces in a URL, when it's in a string I want to capture and pass as an argument to the underlying page?

MrWhite
  • 43,179
  • 8
  • 60
  • 84
Phil Evans
  • 790
  • 9
  • 18

3 Answers3

31

(I tried $1%20$2 at the end, which also went badly).

This looks like a bug. Encoding the space as %20 in the query string should be valid. You can also encode the space as + in the query string (as in your workaround).

In your original rule, Apache should be encoding the space (as %20) when making the internal rewrite (since a literal space is not valid in the URL). However, it would seem Apache is then baulking at the encoded space?!

You can also try using the B flag in your original rule. The B flag tells mod_rewrite to URL-encode the backreference before applying this to the substitution string. However, this would seem to be dependent on Apache encoding the space as + in the query string (as opposed to %20 which it would ordinarily do). Certainly in earlier versions of Apache, this would only have resulted in Apache encoding the space as %20 (not +), however, since version 2.4.26 Apache has introduced a new flag BNP (backrefnoplus) which explicitly informs Apache not to use a +, so you would think that by default, it would use a +. (Unfortunately I can't just test this myself at the moment.)

For example:

RewriteRule ^(FISH\s*J[\d.]+-?\+?\d+)$ myPage.php?sourceName=$1 [B,QSA,L]

(Minor point... no need to backslash-escape the literal dot when used inside a regex character class. I also reduced the digit ranges to the shorthand \d.)

Aside: Can you have both - and + before the last set of digits. It looks like it should perhaps be one or the other (or nothing at all)? eg. [-+]?.

Is there a better solution for this? i.e. how am I "supposed" to handle the case of spaces in a URL, when it's in a string I want to capture and pass as an argument to the underlying page?

Not really (although your solution is not strictly correct - see below). In your particular example, that only contains spaces you shouldn't need to do anything, as mod_rewrite should automatically URL-encode any URL that is not valid. (There is an NE - noescape - flag to explicitly prevent mod_rewrite from doing this - which is sometimes necessary to prevent already encoded characters being doubly encoded.) You can always use the B flag in URL-rewrites of this form (as mentioned above). You would need to use the B flag if there were other special characters, such as & (a special character in the query string) which would not otherwise be URL-encoded (effectively resulting in the URL parameter value being truncated).

RewriteRule ^(FISH)\s*(J[0-9\.]+-?\+?[0-9]+)$ myPage.php?sourceName=$1+$2 [L,QSA]

An issue with your solution is that you are allowing 0 (ie. "none") or more spaces in the request and enforcing a single space in the resulting URL parameter. This is not the same as your original directive, that would preserve the spaces (or lack of) from the original request.

Could there be 0 or more spaces in the initial request?

If yes, and these need to be preserved then it may just be easier to repeat this rule for as many "spaces" as you need. You could implement a search/replace, but that may be overkill.

(In the above I am checking for spaces, not %20, as the browser seems to be converting it to space before it makes it to this rule).

The URL-path that the RewriteRule pattern matches against is first URL-decoded (%-decoded), which is why you need to match against a literal space and not %20. This has nothing to do with the "browser". Any literal spaces in the URL-path "must" be URL-encoded as %20 in the HTTP request that leaves the browser/user-agent otherwise it's simply not valid.


(UPDATE) Restrict which non-alphanumeric characters are encoded

There was a comment (since deleted) where the user was also passing a + (literal plus) in the URL-path and seemingly expecting this to be passed as-is to the query string (via an internal rewrite) which would then be seen as an encoded space. However, the use of the B flag (as above) would result in the literal + being URL encoded as %2b thus preserving the literal + - which would ordinarily be the correct behaviour. However, if the + should be copied as-is and thus seen as an encoded space (not a literal +) in the resulting query string then you can restrict the non-alphanumeric characters that the B flag will encode (requires Apache 2.4.26+). ie. Exclude the +.

For instance, you could limit the encoding to spaces and ? only. For example:

RewriteRule ^(.+)$ index.php?query=$1 "[B= ?,L]"

+ will no longer be encoded in the backreference, so its special meaning in the query string (as an encoded space) will still apply.

NB: You can't encode only spaces (since a space cannot be used as the last character), hence the additional ? character. Consequently, the flags argument needs to be surrounded in double quotes, since spaces are otherwise argument delimiters.

Reference:

MrWhite
  • 43,179
  • 8
  • 60
  • 84
  • 6
    Adding the B flag was enough in the case of my server to solve this issue, which does indeed only appear to have emerged through a bug in the latest Apache: Server version: Apache/2.4.56 (cPanel) Server built: Mar 8 2023 15:06:38 – David Bennett Mar 09 '23 at 15:21
  • 2
    Thanks! Adding the B flag has fixed this for me. On your other points: I can't remember why I didn't use \d; I suspect I copied an old rule that only allowed 0-5 at one point... As to spaces: the "correct" ID will always have one space, but I am deliberately allowing users to enter a URL with/out it. So your criticism of my current solution is completely valid, but in this specific case, it doesn't matter. And the +/- issue: I require exactly one + or - sign (but not both) so you are right just [+-] would be better. I was obviously half asleep when building the original rexep! – Phil Evans Mar 09 '23 at 15:30
  • I would like to thank @MrWhite for their explanation. Adding the B flag also solved this issue for me for my Ajax-powered search feature. Thank you :) – Scott Richardson May 16 '23 at 03:05
  • We had a similar issue in our recently patched RHEL 8 systems, which updated Apache to 2.4.37-51. This: `RewriteRule ^foo/bar/(.*)$ https://example.com/ab?cd=search&ef=1&q=$1 [L]` stopped working when $1 had spaces in it, I think encoded as %20. Our sites, which use CloudFront would give a CloudFront error when the above rewrite triggered. Using this instead: `RewriteRule ^foo/bar/(.*)$ https://example.com/ab?cd=search&ef=1&q=$1 [B,L,NE]` looks to resolve it. No more CloudFront error and the redirect works, encoding the spaces as + or %2b. Thank you @MrWhite! – Special Monkey Jul 09 '23 at 01:34
  • 1
    @SpecialMonkey "encoding the spaces as + or %2b" - `%2b` is a literal `+` (plus), not a _space_. When encoding the _space_ in the query string part of the URL it would need to be either `+` or `%20`. The URL-path that the `RewriteRule` _pattern_ matches against has already been URL decoded - which is the problem since it captures a _literal space_, not an encoded space. – MrWhite Jul 10 '23 at 10:32
  • @MrWhite Thanks for the clarification. To my knowledge, this type of rewrite broke after a RHEL 8 patch to Apache; I believe related to CVE-2023-25690. The `[B,L,NE]` flag set seems to fix. The `[L]` only flag I think caused a 403. Some other combination that didn't turn spaces into plusses in the query, maybe `[B,L]`, caused the target redirect to fail. It landed on the target system but the query didn't produce the desired result. The target system I think expected %20 for spaces in the query but also worked with plusses. – Special Monkey Jul 10 '23 at 15:16
14

It's a recent security fix.

apache2 (2.4.52-1ubuntu4.4) jammy-security; urgency=medium

  * SECURITY UPDATE: HTTP request splitting with mod_rewrite and mod_proxy
    - debian/patches/CVE-2023-25690-1.patch: don't forward invalid query
      strings in modules/http2/mod_proxy_http2.c,
      modules/mappers/mod_rewrite.c, modules/proxy/mod_proxy_ajp.c,
      modules/proxy/mod_proxy_balancer.c, modules/proxy/mod_proxy_http.c,
      modules/proxy/mod_proxy_wstunnel.c.
    - debian/patches/CVE-2023-25690-2.patch: Fix missing APLOGNO in
      modules/http2/mod_proxy_http2.c.
    - CVE-2023-25690
  * SECURITY UPDATE: mod_proxy_uwsgi HTTP response splitting
    - debian/patches/CVE-2023-27522.patch: stricter backend HTTP response
      parsing/validation in modules/proxy/mod_proxy_uwsgi.c.
    - CVE-2023-27522

 -- Marc Deslauriers <marc.deslauriers@ubuntu.com>  Wed, 08 Mar 2023 12:32:01 -0500

Halfgaar
  • 732
  • 2
  • 7
  • 32
  • I still have this problem (and need the B flag fix) after updating apache to Apache/2.4.56 (Ubuntu 18.04 LTS) Server built: 2023-03-09T07:33:59 I guess this ubuntu version didn't get the Apache2 security update patch? – user6096790 Mar 17 '23 at 19:32
  • 1
    @user6096790 the B flag fix is what you need when the server did get the fix. – Halfgaar Mar 18 '23 at 09:08
0

Debugging Apache (ErrorLog with LogLevel rewrite:trace6) shows, that calling

/FISH%20J12345.6-78919

with

RewriteRule ^(FISH\s*J[0-9\.]+-?\+?[0-9]+)$ myPage.php?sourceName=$1 [L,QSA]

decodes the %20 correctly to Space before mod_rewrite gets it. And the URL is rewritten to

'myPage.php?sourceName=FISH J12345.6-78919'

There is a Space in the query param and mod_rewrite does not like this (anymore).

Actually two things happen with mod_rewrite and a rule like

RewriteRule ^(.*)$ index.php?q=$1 [L,QSA]

First the PATH part of the URL is decoded (beware that a + in the PATH part is a +, and not decoded to a Space) and handed to mod_rewrite. This then is put to $1. The QUERY part of the original URL is not decoded, but merged into the rewritten PATH part. Then that new URL is handed back to Apache. The php then decodes the QUERY params. Which makes for a double decoding of the PATH part, as in the rewritten URL it is a QUERY param.

Without [B], e.g. /A%2520B/?a=b%2520c (%25 decoded is %) is rewritten to q=A%20B/&a=b%2520c ending up in php as "q" => "A B/", "a" => "b%20c". Actually not quite what is expected at first sight (at least what I expected up to now, which was "q" => "A%20B/").

So probably using [B] for moving PATH parts to QUERY param is the better choice anyway, ensuring it only gets decoded once.

With [B], /A%2520B/?a=b%2520c is finally rewritten to q=A%2520B%2f&a=b%2520c ending up in php as "q" => "A%20B/", "a" => "b%20c". Looks better to me.

With [B] the FISH link gets encoded like so escaping backreference 'FISH J12345.6-78919' to 'FISH+J12345%2e6%2d78919', so encoding the Space is done with the + (not %20). In php it gets decoded again.

I suppose, for single encoded PATH parts, not using [B] in most cases was ok, most likely because the % sign is not much used in PATH parts. Using [B] for me is now the better solution.

There is one caveat, answered already elsewhere here: As + is valid in the PATH part, /A+%2bB/ is passed to mod_rewrite as A++B/ (so the first + stays a +), finally being passed as q=A%2b%2bB%2f ending up in php as "q" => "A++B/". This cannot be overcome, as + is handled different in PATH part than in QUERY part.