31

I would like to have pretty URLs for my tagging system along with all the special characters: +, &, #, %, and =. Is there a way to do this with mod_rewrite without having to double encode the links?

I notice that delicious.com and stackoverflow seem to be able to handle singly encoded special characters. What's the magic formula?

Here's an example of what I want to happen:

http://www.example.com/tag/c%2b%2b

Would trigger the following RewriteRule:

RewriteRule ^tag/(.*)   script.php?tag=$1

and the value of tag would be "c++"

The normal operation of apache/mod_rewrite doesn't work like this, as it seems to turn the plus signs into spaces. If I double encode the plus sign to '%252B' then I get the desired result - however it makes for messy URLS and seems pretty hack to me.

Seenu S
  • 3,381
  • 6
  • 30
  • 45
Aldie
  • 819
  • 1
  • 11
  • 16

5 Answers5

28

The normal operation of apache/mod_rewrite doesn't work like this, as it seems to turn the plus signs into spaces.

I don't think that's quite what's happening. Apache is decoding the %2Bs to +s in the path part since + is a valid character there. It does this before letting mod_rewrite look at the request.

So then mod_rewrite changes your request '/tag/c++' to 'script.php?tag=c++'. But in a query string component in the application/x-www-form-encoded format, the escaping rules are very slightly different to those that apply in path parts. In particular, '+' is a shorthand for space (which could just as well be encoded as '%20', but this is an old behaviour we'll never be able to change now).

So PHP's form-reading code receives the 'c++' and dumps it in your _GET as C-space-space.

Looks like the way around this is to use the rewriteflag 'B'. See http://httpd.apache.org/docs/2.2/mod/mod_rewrite.html#rewriteflags - curiously it uses more or less the same example!

RewriteRule ^tag/(.*)$ /script.php?tag=$1 [B]
bobince
  • 528,062
  • 107
  • 651
  • 834
  • 2
    Note: The B flag is available since Apache 2.2. – Gumbo Jan 20 '09 at 17:13
  • Warning: The B flag, turns "C%2B%2B+Stack+Overflow" into "C+++Stack+Overflow". – vallentin Sep 10 '15 at 11:59
  • @Vallentin: that is correct behaviour. In a URL path part, `+` literally means a plus, and so it's semantically equivalent to `%2B`. `+` only represents a space in the query part (application/x-www-form-url-encoded rules instead of pure URL rules). – bobince Sep 11 '15 at 00:04
  • @bobince yea, but if you "manually" do "?tags=C%2B%2B+Stack+Overflow" then it becomes `C++ Stack Overflow`. Because that's currently the problem I'm trying to overcome [questioned here](http://stackoverflow.com/questions/32502135/htaccess-decodes-both-2b-and-into-space). – vallentin Sep 11 '15 at 00:08
  • I have a almost the same problem, but it is really unsolvable ..! May you please take a look at it? http://stackoverflow.com/questions/34564364/how-to-pass-a-persian-string-as-a-argument-in-the-url – Shafizadeh Jan 02 '16 at 20:45
5

I'm not sure I understand what you're asking, but the NE (noescape) flag to Apache's RewriteRule directive might be of some interest to you. Basically, it prevents mod_rewrite from automatically escaping special characters in the substitution pattern you provide. The example given in the Apache 2.2 documentation is

RewriteRule /foo/(.*) /bar/arg=P1\%3d$1 [R,NE]

which will turn, for example, /foo/zed into a redirect to /bar/arg=P1%3dzed, so that the script /bar will then see a query parameter named arg with a value P1=zed, if it looks in its PATH_INFO (okay, that's not a real query parameter, so sue me ;-P).

At least, I think that's how it works . . . I've never used that particular flag myself.

David Z
  • 128,184
  • 27
  • 255
  • 279
1

I meet the similar problem for mod_rewrite with + sign in url. The scenario like below:

we have a url with + sign need rewrite like http://deskdomain/2013/08/09/a+b+c.html

RewriteRule ^/(.*) http://mobiledomain/do/urlRedirect?url=http://%{HTTP_HOST}/$1

The struts action urlRedirect get url parameter, do some change and using the url for another redirect. But in req.getParameter("url") the + sign change to empty, parameter url content is http://deskdomain/2013/08/09/a b c.html , that cause redirect 404 not found. For resolve it (get help from prior answer)we use rewrite flag B (escape backreferences), and NE (noescape)

RewriteRule ^/(.*) http://mobiledomain/do/urlRedirect?url=http://%{HTTP_HOST}/$1 [B,NE]

The B , will escape + to %2B , NE will prevent mod_write escape %2B to %252B (double escape + sign), so in req.getParameter("url")=http://deskdomain/2013/08/09/a+b+c.html

I think the reason is req.getParameter("url") will do a unescape for us, the + sign can unescape to empty. You can try unescape %2B one time to + , then unescape + again to empty.

"%2B" unescape-> "+" unescape-> " "

yren
  • 21
  • 4
1

I finally made it work with the help of RewriteMap.

Added the escape map in httpd.conf file RewriteMap es int:escape

and used it in Rewrite rule

RewriteRule ([^?.]*) /abc?arg1=${es:$1}&country_sniff=true [L]
Mr.Wizard
  • 24,179
  • 5
  • 44
  • 125
Nitin
  • 11
  • 1
1

The underlying problem is that you are moving from a request that has one encoding (specifically, a plus sign is a plus sign) into a request that has different encoding (a plus sign represents a space). The solution is to bypass the decoding that mod_rewrite does and convert your path directly from the raw request to the query string.

To bypass the normal flow of the rewrite rules, we’ll load the raw request string directly into an environment variable and modify the environment variable instead of the normal rewrite path. It will already be encoded, so we don't generally need to worry about encoding it when we move it to the query string. What we do want, however, is to percent-encode the plus signs so that they are properly relayed as plus signs and not spaces.

The rules are incredibly simple:

RewriteEngine On

RewriteRule ^script.php$ - [L]

# Move the path from the raw request into _rq
RewriteCond %{ENV:_rq} =""
RewriteCond %{THE_REQUEST} "^[^ ]+ (/path/[^/]+/[^? ]+)"
RewriteRule .* - [E=_rq:%1]

# encode the plus signs (%2B)  (Loop with [N])
RewriteCond %{ENV:_rq} "/path/([^/]+)/(.*)\+(.*)$"
RewriteRule .* - [E=_rq:/path/%1/%2\%2B%3,N]

# finally, move it from the path to the query string
# ([NE] says to not re-code it)
RewriteCond %{ENV:_rq} "/path/([^/]+)/(.*)$"
RewriteRule .* /path/script.php?%1=%2 [NE]

This trivial script.php confirms that it works:

<input readonly type="text" value="<?php echo $_GET['tag']; ?>" />
danorton
  • 11,804
  • 7
  • 44
  • 52