9

Good old regular expressions are driving me nuts.

I need to redirect all traffic in Apache 2.4 from HTTP to HTTPS, except for "/bt/sub/[a_few_endings]", using Redirect from mod_alias (can't use mod_rewrite).

I tested the following regular expression in all online testers I know (e.g. http://regex101.com/) and all confirm that the regex should indeed match everything except the URLs I don't want it to match:

^/(?!bt/sub/(went_active|success|cancel|expired)).*$ 

As far as I can tell, this should match everything in http://local.mysite.com and redirect it to https://local.mysite.com, except for the following four:

Still, Apache redirects everything, including the above URLs I don't want redirected.

I found several similar questions in SO but most of them are answered in the light of mod_rewrite, which is not what I want/need, and the ones that people say have worked have not worked for me.

Here's my virtual host configuration as it currently stands:

<VirtualHost *:80> 
    ServerName local.mysite.com 
    RedirectMatch 302 ^/(?!bt/sub/(went_active|success|cancel|expired)).*$ https://local.mysite.com 
    DocumentRoot /home/borfast/projects/www/mysite/public 
    #Header set Access-Control-Allow-Origin * 
    SetEnv LARAVEL_ENV localdev 

    <Directory /home/borfast/projects/www/mysite/public/> 
        Options All 
        DirectoryIndex index.php 
        AllowOverride All 
        Require all granted 
    </Directory> 
</VirtualHost>

Please help and prevent me from going crazy :)


UPDATE: There's something weird going on: apparently when the requested URL/path can be found, Apache ignores the expression in RedirectMatch and redirects the client, even though the RedirectMatch tells it not to.

To test this I created a new virtualhost from scratch inside a separate VM freshly installed with Ubuntu Trussty 64, loaded with Apache 2.4. This new virtual host contained just the ServerName, RedirectMatch and DocumentRoot directives, like this:

<VirtualHost *:80>
    ServerName testing.com
    RedirectMatch 302 ^/(?!bt/sub/(went_active|success)$).*$ https://othersite.com/

    DocumentRoot /home/vagrant/www
</VirtualHost>

I created the directory /home/vagrant/www/bt/sub/went_active to make sure Apache could get to at least one of the two possible URLs. When trying to access http://testing.com:8080, I get redirected, as expected.

Then the weirdness comes: when accessing http://testing.com:8080/bt/sub/went_active, the URL that matches the directory I created, I am still redirected, even though I shouldn't be, but when accessing http://testing.com:8080/bt/sub/success, I don't get redirected and instead get a 403 Forbidden.

I may be losing my sanity over this but it seems that when Apache sees that it could serve the request and it matches the regular expression in RedirectMatch that should prevent the redirect, it decides to ignore the regular expression and do the redirect anyway. Three letters for this: WTF?!?!?!

borfast
  • 2,194
  • 1
  • 15
  • 34
  • Your regex matches everything for me **.*\.com\/foo\/bar\/(aaa|bbb|ccc|ddd)** i made this one which matches just the ones you dont want replaced, just make it in a negative lookahead or something if you cant do the logic in code and need it to match the oposite – Vajura Oct 16 '14 at 08:55
  • No need in escaping of `/` – Cheery Oct 16 '14 at 18:29
  • 1
    This _would_ be easier using `mod_rewrite` … – CBroe Oct 16 '14 at 18:55
  • Yeah, but I explicitly said I need to use mod_alias. The world *would* be better if people wouldn't buy a new phone every year, among so many other things, but there are some things you just can't change. ;) – borfast Oct 16 '14 at 19:58

2 Answers2

10

As it was said in comments - it is easier to do with mod_rewrite. Possible solutions

RewriteEngine On
RewriteCond %{REQUEST_URI} !^/bar/(abcd|baz|barista|yo)$ [NC]
RewriteRule ^ http://site/ [R=301,L]

Another one (for .htaccess, as initial / is removed from RewriteRule)

RewriteEngine On
RewriteRule !^bar/(abcd|baz|barista|yo)$ http://site/ [R=301,L,NC]

And solution by RedirectMatch

RedirectMatch permanent ^(?!/bar/(abcd|baz|barista|yo)$).* http://site/

All work perfectly, the problem that you might have on a testing/debugging state is that browser caches 301 response. So, when you are trying to check or to write the correct code - use 302 response, not 301. And remove NC flag if case insensitivity is not required.

Cheery
  • 16,063
  • 42
  • 57
  • I understand mod_rewrite would be easier but I explicitly said that I need to use mod_alias. As for the cache, I had that in mind and tried several different endings (abcd, baz, etc - new one every time) as well as monitored the HTTP requests to make sure the browser was actually getting the right redirection headers. – borfast Oct 16 '14 at 19:56
  • @borfast I tried it and it works (Ubuntu, Apache 2.4.7, tried all of them). It means that your problem is not in this line, but somewhere else. Not enough information to tell exactly where it could be. – Cheery Oct 16 '14 at 20:07
  • that's pretty much the conclusion I'm coming to as well. I just updated my question with the full virtual host config, in case someone can spot the problem there. – borfast Oct 16 '14 at 20:19
  • @borfast Did you try to put `RedirectMatch` inside of `Directory`? or, at least, after the `DocumentRoot` (not sure that it matters, but worse of trying). I used `.htaccess`, but can try to test in config too. – Cheery Oct 16 '14 at 20:20
  • Had not tried it before but just tried it now, both possibilities. Even tried it on another browser. No joy. – borfast Oct 16 '14 at 20:33
  • @borfast just tested in `VirtualHost` - works perfectly. Only added `RedirectMatch 302 ^/(?!bt/sub/(went_active|success|cancel|expired)$).*$ https://localhost ` to it, nothing else. Something is wrong with your Apache. – Cheery Oct 16 '14 at 20:40
  • The interesting bit is that our live server is having the exact same behaviour as my local dev environment, and this is a brand new server with the latest Ubuntu. This is driving me nuts... :| – borfast Oct 16 '14 at 22:03
  • @borfast try the rule in `.htaccess`. Also you might have some other rules interfering with this one. Or, for example, you finally get into the `php` or other script that redirects you. – Cheery Oct 16 '14 at 22:08
  • I just tried something that doesn't even involve PHP (it wasn't installed) and got the same results. I did notice a pattern, though. Check the update at the bottom of my question. – borfast Oct 16 '14 at 23:11
  • @borfast do you actually have those folders on the server? The problem might be inside of them - apache checks `.htaccess` in each of them coming up from the directory to the root of the server. If directories exist, another .htaccess in them may be used first. `403 Forbidden` probably means that you get to the existing directory with indexing of the files in it turned off. – Cheery Oct 16 '14 at 23:16
  • No, not in the original set up; they're just paths handled by a PHP/Laravel application. But on the test VM there was only one directory: `/home/vagrant/www/bt/sub/went_active`, and `/home/vagrant/www/bt/sub/success` didn't exist. It was also just a plain virtual host with no application installed. There was no .htaccess in that one directory, or anything else involved. In fact the VM didn't even have PHP, it was a plain Ubuntu Trusty installation with only Apache 2.4 installed on top of it (`sudo apt-get install apache2`). Couldn't get any purer than that for testing, I think. – borfast Oct 16 '14 at 23:31
  • @borfast Sorry, but I can not reproduce a problem - it works in my configuration. It is not completely 'pure' as I'm using it for some other purposes, but nothing extraordinary. I'll try a pure installation, but not today. – Cheery Oct 16 '14 at 23:34
  • I understand, and I appreciate your time and effort. Hopefully someone will be able to make some sense out of this. – borfast Oct 16 '14 at 23:36
  • @borfast I just made a clean installation of Ubuntu Server (without LAMP or Apache, clean one). Installed Apache, added this `RedirectMatch` and guess - it works! http://oi61.tinypic.com/2hebi8h.jpg – Cheery Oct 16 '14 at 23:52
  • thanks for the update. There is definitely something wrong going on here. I even tried a newly installed browser to make sure it was nothing on the one I've been using, and it still didn't work. Even worse, I get the same results in our production server. It's driving me crazy! But maybe it is something on the client side. I'll try a different computer altogether, see if that helps. – borfast Oct 17 '14 at 09:04
  • @borfast if you can give me access I can take a look at it. – Cheery Oct 17 '14 at 20:29
  • Thanks for the offer, Cheery, but it's a project from the company I work at and I'm afraid I don't have the authorization to allow non-employees to access the code. By the way, just as a curious note to add a bit more madness to the already interesting situation: I also tried the mod_rewrite options and I'm still getting the same result - I always get redirected even when the URL matches the regex. Nuts... – borfast Oct 17 '14 at 23:07
  • @borfast I can look at empty config in your virtual machine. – Cheery Oct 17 '14 at 23:09
1

See this one

^\/.*(?<!foo\/bar\/(aaa|bbb|ccc|ddd))$

Match / followed by anything, unless the end of string is preceded by /foo/bar/(aaaa|bbb...)

The (?<! is a negative lookbehind attached to end of string $ which will check its impossible to match what's inside just before the end of string.

If your real case is not as static as your exemple, give real datas for the query part.

Tensibai
  • 15,557
  • 1
  • 37
  • 57
  • Negative lookbehind should also work and is also confirmed by the online regex checkers but alas, it doesn't work. More specifically, with this line in the virtual host config file: `RedirectMatch 302 ^\/.*(?<!foo\/bar\/(aaa|bbb|ccc|ddd))$ https ://mysite.com`, I still get redirected for every URL, including http ://mysite.com/foo/bar/aaa (as well as the other three terminations). My case is just this simple one I described. I want every single URL redirected to the HTTPS URL, except those four. No queries involved, just plain URLs. Seemed like something simple at first... :| – borfast Oct 16 '14 at 18:14
  • I may be wrong but is there anything else in the config involving redirect ? I'll have to check negative lookahead and lookbehind in apache, which version are you using ? – Tensibai Oct 16 '14 at 18:23
  • Oops, version in the question, I'll try tomorrow on a 2.4 apache. – Tensibai Oct 16 '14 at 18:24
  • Interesting fact I didn't know: lookbehinds don't work with variable length alternations. Since the endings of my URLs don't all have the same size, a lookbehind probably isn't going to do the trick. The config has a `ServerName`, a `DocumentRoot` and a `SetEnv` directives, and a `` section, nothing more. I'm commenting them out to see if it makes a difference. – borfast Oct 16 '14 at 18:33
  • Commenting out the other settings in the file made it work. No idea why, I'm trying them out one by one to determine what's making the difference. ----- EDIT: Nevermind that, false alarm, it was me being daft. – borfast Oct 16 '14 at 18:36
  • The order of directives matters. Is the redirect match before the directory ? And if not could the directory match the Foo part ? – Tensibai Oct 16 '14 at 18:44
  • Ignore my previous comment. Commenting out the "Require all granted" directive in the Directory section was simply resulting in a "403 Forbidden" but in my eagerness to solve this, I was only seeing that it was no longer being redirected. As for the real URLs, I am actually testing this with the foo/bar URLs. The only difference is that instead of "foo" and "bar" the real URLs have other words but that's it. But I'll update my question with the real words. – borfast Oct 16 '14 at 18:44
  • No, I had that in mind and placed the RedirectMatch as the second directive in the file, right after ServerName. – borfast Oct 16 '14 at 18:48
  • I don't know how owing negative lookbehind could do, and regex101 is horrible on smartphone, giving her bath to my little girl, I'll may try later ;)) – Tensibai Oct 16 '14 at 18:50
  • Let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/63184/discussion-between-borfast-and-tensibai). – borfast Oct 16 '14 at 18:51