4

I'd like to build an ISAPI_Rewrite 3 "RewriteRule" to handle the following permanent redirects:

╔════════════════════════════════╦════════════════════════════╗
║             Input              ║           Redirect         ║
╠════════════════════════════════╬════════════════════════════╣
║ /path/?a=foo&b=bar             ║ /path/foo/bar/             ║
║ /path/?b=baz&a=qux             ║ /path/qux/baz/             ║
║ /path/?c=1&a=foo&d=2&b=bar&e=3 ║ /path/foo/bar/?c=1&d=2&e=3 ║
╚════════════════════════════════╩════════════════════════════╝

For example;

RewriteCond ${QUERY_STRING} (?:^|&)a=([\w]+)(?:&|$)
RewriteCond ${QUERY_STRING} (?:^|&)b=([\w]+)(?:&|$)
RewriteRule ^/path/$ /path/%1/%2/? [R=301]

will work, except for it'll strip all of the query string pairs (failing the third example). I can't seem to figure out an elegant solution to strip only known key/value pairs from the URL. And something like...

RewriteCond ${QUERY_STRING} ^(.*)&?a=([\w]+)(.*)&?b=([\w]+)(.*)$
RewriteRule ^/path/$ /path/%2/%4/?%1%3%5 [R=301]

isn't exactly right (but you should get what the example is trying to do), and gets real messy in a hurry.

Any ideas?

Clarification: My third example, a request for /path/?c=1&a=foo&d=2&b=bar&e=3 should redirect to /path/foo/bar/?c=1&d=2&e=3 and NOT /path/foo/bar/1/2/3/. I may not know what query string pairs are going to be requested with and some of them may be required to stay in the query string for client side processing. Some example unknown query string keys are;

  • "gclid" - used by Google Analytics (GA) client script to tie in Adwords data
  • "utm_source" - used to explicitly tell GA the traffic source/type
David Budiac
  • 811
  • 2
  • 9
  • 21
  • 1
    As a tidbit, checking for a parameter can be simplified a little, as `(?:^|&)a=([^&]+)`, where your non-capturing start or ampersand still pins down the start (avoiding xa=123), but the value can be simply not-ampersand (which will stop at the next ampersand, or run to the end if none). – goodeye May 23 '13 at 16:38
  • I think separate rules is the way to go, but there are many of them. I can do a specific answer later, but the idea is: ab, ba, abe, cab, cadb, cabe, cadbe. The problem with optional capturing is you end up with an 'extra' ampersand which is hard to deal with, so specific permutations are easier (just a lot of them). Question: are there always `a` and `b` parameters (in either order), or might there be one or the other? – goodeye May 23 '13 at 17:08
  • I think I have a solution. In Application_BeginRequest() in global.asax (runs before .htaccess rules), I'll inspect that the various dynamic pages have their query string variables in the proper order, and then let .htaccess process with one rule. – David Budiac May 23 '13 at 21:13
  • This all came about because I saw different page records/stats in Google Analytics for /path/foo/bar/ and /path/Default.aspx?a=foo&b=bar... which direct the same page, however the latter does not redirect to the former (as it should) – David Budiac May 23 '13 at 21:14
  • This might be easier with code, instead of rules. I always seem to have trouble with global.asax; if so, you could also do it right in default.aspx - if the url is incorrect, fix it and redirect. – goodeye May 23 '13 at 22:02
  • Another tidbit, I keep getting caught by: It's `%{QUERY_STRING}` not `${QUERY_STRING}`. Mine just silently doesn't match, it's not an error, which was very frustrating. – goodeye May 28 '13 at 22:12

2 Answers2

2

I did this in two stages. The first is a set of rules that extracts one parameter, leaving the others as-is. Then I expanded this to two parameters. The rules themselves aren't that bad; it was just tricky figuring it out.

One Parameter

This uses the a parameter for the example. There are 4 conditions:

  • a only
  • a-other
  • other-a
  • other-a-other

Because of the question mark vs. ampersand, it's simplest to do separate rules. It turned out that the last two were easy to combine into one rule.

(Note: I'm using Helicon Ape rewrite, which is generally compatible with Apache. I had an issue where the RewriteRule question mark needs to be escaped before parameters, e.g., \?%2. I don't know if this true in general.)

# a
#  ?a=foo
#  Starts with a=, non-ampersand to the end.
#  Suppress querystring with trailing question mark.
RewriteCond ${QUERY_STRING} ^a=([^&]+)$
RewriteRule ^/path/$ /path/%1/? [NC,R=301,L]

# a-other
#  ?a=foo&b=bar, ?a=foo&b=bar&c=1
#  Starts with a=, non-ampersand, ampersand, remaining required.
#  Escape question mark so it doesn't include entire original querystring.
RewriteCond ${QUERY_STRING} ^a=([^&]+)&(.+)
RewriteRule ^/path/$ /path/%1/\?%2 [NC,R=301,L]

# other-a or other-a-other
#  ?b=baz&a=qux, ?b=baz&c=1&a=qux
#  ?c=1&a=foo&d=2&b=bar&e=3, ?z=4&c=1&a=foo&d=2&b=bar&e=3
#  Starts with anything, ampersand, a=, non-ampersand, remaining optional.
#  The remaining optional lets it follow with nothing, or with ampersand and more parameters.
#  Escape question mark so it doesn't include entire original querystring.
RewriteCond ${QUERY_STRING} ^(.+)&a=([^&]+)(.*)$
RewriteRule ^/path/$ /path/$2/\?%1%3 [NC,R=301,L]



Two Parameters

To put these together for both parameters is a little tricky, but the idea is to rewrite a, then fall through to rewrite b and redirect. To manage the two sections together, rewrite path to temppath then temppath2, then rewrite it back to path when done. This ensures that these only run when both a and b are present. If only one or the other is present, then it skips all this. (If you meant to also handle only one, this can be adjusted.)

# Test cases:
#  1) /path/?a=foo&b=bar    to    /path/foo/bar
#  2) /path/?a=foo&b=bar&c=1     to    /path/foo/bar?c=1
#  3) /path/?a=foo&b=bar&c=1&d=2    to    /path/foo/bar?c=1&d=2
#  4) /path/?b=baz&a=qux    to    /path/qux/baz
#  5) /path/?b=baz&c=1&a=qux    to    /path/qux/baz/?c=1
#  6) /path/?c=1&b=baz&a=qux    to    /path/qux/baz/?c=1
#  7) /path/?c=1&d=2&b=baz&a=qux    to    path/qux/baz/?c=1&d=2
#  8) /path/?c=1&a=foo&d=2&b=bar&e=3    to    /path/foo/bar/?c=1&d=2&e=3
#  9) /path/?z=4&c=1&a=foo&d=2&b=bar&e=3    to    /path/foo/bar/?z=4&c=1&d=2&e=3

# Check for a and b (or b and a), rewrite to temp path and continue.
RewriteCond ${QUERY_STRING} (?:^|&)(?:a|b)=.+&(?:b|a)=.+$
RewriteRule ^/path/$ /temppath/ [NC]

# a
#  ?a=foo
#  This case isn't needed, since we test for a and b above.

# a-other
#  1) /temppath/?a=foo&b=bar    to    /temppath2/foo/?b=bar
#  2) /temppath/?a=foo&b=bar&c=1     to    /temppath2/foo/?b=bar&c=1
#  3) /temppath/?a=foo&b=bar&c=1&d=2    to    /temppath2/foo/?b=bar&c=1&d=2
#  Starts with a=, non-ampersand, ampersand, remaining required.
RewriteCond ${QUERY_STRING} ^a=([^&]+)&(.+)$
RewriteRule ^/temppath/$ /temppath2/%1/\?%2 [NC]

# other-a or other-a-other
#  4) /temppath/?b=baz&a=qux    to    /temppath2/qux/?b=baz
#  5) /temppath/?b=baz&c=1&a=qux    to    /temppath2/qux/?b=baz&c=1
#  6) /temppath/?c=1&b=baz&a=qux    to    /temppath2/qux/?c=1&b=baz
#  7) /temppath/?c=1&d=2&b=baz&a=qux    to    /temppath2/qux/?c=1&d=2&b=baz
#  8) /temppath/?c=1&a=foo&d=2&b=bar&e=3    to    /temppath2/foo/?c=1&d=2&b=bar&e=3
#  9) /temppath/?z=4&c=1&a=foo&d=2&b=bar&e=3    to    /temppath2/foo/?z=4&c=1&d=2&b=bar&e=3
#  Starts with anything, ampersand, a=, non-ampersand, remaining optional.
#  The remaining optional lets it follow with nothing, or with ampersand and more parameters.
#  Escape question mark so it doesn't include entire original querystring.
RewriteCond ${QUERY_STRING} ^(.+)&a=([^&]+)(.*)$
RewriteRule ^/temppath/$ /temppath2/%2/\?%1%3 [NC]

# b
#  1) /temppath2/foo/?b=bar    to    /path/foo/bar
#  4) /temppath2/qux/?b=baz    to    /path/qux/baz
#  Starts with b=, non-ampersand to the end.
#  Capture and use path after temppath2, since it has the a folder from above.
RewriteCond ${QUERY_STRING} ^b=([^&]+)$
RewriteRule ^/temppath2/(.*)/$ /path/$1/%1/? [NC,R=301,L]

# b-other
#  2) /temppath2/foo/?b=bar&c=1    to    /path/foo/bar?c=1
#  3) /temppath2/foo/?b=bar&c=1&d=2    to    /path/foo/bar?c=1&d=2
#  5) /temppath2/qux/?b=baz&c=1    to    /path/qux/baz/?c=1
#  Starts with b=, non-ampersand, ampersand, remaining required.
#  Capture and use path after temppath2, since it has the a folder from above.
#  Escape question mark so it doesn't include entire original querystring.
RewriteCond ${QUERY_STRING} ^b=([^&]+)&(.+)$
RewriteRule ^/temppath2/(.*)/$ /path/$1/%1/\?%2 [NC,R=301,L]

# other-b or other-b-other
#  6) /temppath2/qux/?c=1&b=baz    to    /path/qux/baz/?c=1
#  7) /temppath2/qux/?c=1&d=2&b=baz    to    /path/qux/baz/?c=1&d=2
#  8) /temppath2/foo/?c=1&d=2&b=bar&e=3    to    /path/foo/bar/?c=1&d=2&e=3
#  9) /temppath2/foo/?z=4&c=1&d=2&b=bar&e=3    to    /path/foo/bar/?z=4&c=1&d=2&e=3
#  Starts with anything, ampersand, b=, non-ampersand, remaining optional.
#  The remaining optional lets it follow with nothing, or with ampersand and more parameters.
#  Capture and use path after temppath2, since it has the a folder from above.
#  Escape question mark so it doesn't include entire original querystring.
RewriteCond ${QUERY_STRING} ^(.+)&b=([^&]+)(.*)$
RewriteRule ^/temppath2/(.*)/$ /path/$1/%2/\?%1%3 [NC,R=301,L]


It is probably easier in code....

goodeye
  • 2,389
  • 6
  • 35
  • 68
0

It's an interesting one! My experience shows that it's better to have several simple condition than one cumbersome rule. It's usually more effective and efficient.

My suggestion:

# a-b pair
RewriteCond %{QUERY_STRING} a=([^&]+)
RewriteCond %{QUERY_STRING} b=([^&]+)
RewriteRule ^/path/$ /path/%1/%2/? [NC,R=301,L]

# a-b-c pair
RewriteCond %{QUERY_STRING} a=([^&]+)
RewriteCond %{QUERY_STRING} b=([^&]+)
RewriteCond %{QUERY_STRING} c=([^&]+)
RewriteRule ^/path/$ /path/%1/%2/%3/? [NC,R=301,L]

This may not be as elegant as you want, but it should still do the trick. I can also come up with some more approaches

Andrew
  • 511
  • 3
  • 7
  • Splitting into separate rules is totally fine. However, I'm never trying to redirect to /path/%1/%2/%3/ in your a-b-c pair example. If you're including "c", I'd want to direct to /path/%1/%2/?c=value instead. The point is that the value for "c" needs to be available in the query string for the client to parse... which could be any number of keys like: "gclid" (google uses to pass Adwords info to client-side analytics script) or "utm_medium" (google analytics client side script also uses to process the medium) – David Budiac May 21 '13 at 17:33
  • Seems like your code will not work. %1 will mean the value of b variable in first example, and the value of c variable in second example. – Hamid Sarfraz Jul 04 '16 at 12:00