4

I am in the process of converting my site with many static html pages to a site driven by a database. My problem is that I don't want to lose what google has already indexed, so I would like to rewrite requests to be sent through a php script which lookup the filepath for content in the database. My understanding is that a mod_rewrite would best serve this purpose, unfortunately I have never used it, so I am a bit lost.

What I have:

www.domain.com/index.html
www.domain.com/page.html?var=123&flag=true
www.domain.com/folder/subfolder/
www.domain.com/folder/subfolder/index.html
www.domain.com/folder/subfolder/new/test.html
www.domain.com/folder/subfolder/new/test.html?var=123&flag=true

What I want (I also probably need to urlencode the path)(passing the full uri is also ok):

www.domain.com/index.php?page=/index.html OR www.domain.com/index.php?page=www.domain.com/index.html
www.domain.com/index.php?page=/page.html?var=123&flag=true
www.domain.com/index.php?page=/folder/subfolder/
www.domain.com/index.php?page=/folder/subfolder/index.html
www.domain.com/index.php?page=/folder/subfolder/new/test.html
www.domain.com/index.php?page=/folder/subfolder/new/test.html?var=123&flag=true

Here's my first go at it:

RewriteEngine On  # Turn on rewriting    
RewriteCond %{REQUEST_URI} .* # Do I even need this?    
^(.*)$ /index.php?page=$1

Ideas? Thanks in advance :)

Edit:

So I tried implementing Ragnar's solution, but I kept getting 500 errors when I use 'RewriteCond $1' or include the '/' on the last line. I have setup a test.php file which will echo GET_["page"] so I know that the rewrite is working correctly. So far I can get some of the correct output (but only when I am not in root), for example:

RewriteEngine on
RewriteRule ^page/(.*)$ test.php?page=$1 [L]

If I visit the page http://www.domain.com/page/test/subdirectory/page.html?var=123 it will output 'test/subdirectory/page.html' (missing the querystring, which I need). However, if I use this example:

RewriteEngine on
RewriteRule ^(.*)$ test.php?page=$1 [L]

If I visit http://www.domain.com/page/test/subdirectory/page.html?var=123 it will only output 'test.php' which is thoroughly confusing. Thoughts?

Edit #2:

It seems I've been going about this all wrong. I just wanted the ability to use full uri in my php script page. The final working solution to do what I want is the following:

Options +FollowSymlinks
RewriteEngine On
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule . /test.php

Then in my php script, I can use $_SERVER['REQUEST_URI'] to get what I need. I knew this should have been easier than what I was trying...

veerman
  • 103
  • 1
  • 2
  • 8

2 Answers2

2

There's no need for so many lines, it only complicates things.

All you need is 2 lines in .htaccess:

rewriteengine on
#rewriterule-: ar1 path =relative. ar2 if relative, that to rewritebase.
rewriterule !^foo/bar/index\.php$ /foo/bar/index.php
#..assert ar1 dismatches correct url

In PHP

You can output the first input of rewriterule in PHP using:

<?=$_SERVER['REQUEST_URI'];

That will give you all the power and allow you to do all things. Simply parse $_SERVER["REQUEST_URI"] manually and you can echo totally different pages depending on the value of $_SERVER["REQUEST_URI"].


Sec bugs

Note that your server may do pathing or buggy pathing before rewriterule. (You can't override this behavior without server privileges.) Eg if the user visits /foo//// you may only see /foo/ or /foo. And eg if the user visits ///foo you may only see /foo. And eg if the user visits /a/../foo you may only see /foo. And eg if the user visits /a//b/../../foo you may only see /foo or /a/foo [because buggy production servers treat multiple / as distinct in the context of .., no kidding].


With circuit break

Rewrite circuit breaks on cin identical to htaccess∙parentfolder∙relative interpreted rewriterule∙arg2. (First off, personally I'd disable circuit breaks to reduce rule complexity but there seems to be no way to do so.)

Circuit-break solution:

rewriteengine on
#rewriterule-: ar1 path =relative. ar2 if relative, that to rewritebase.
rewriterule ^ /foo/bar/index.php
#..circuit-breaking, so assert ar2 if absolute, htaccess parentfolder =root, else, htaccess parentfolder not in interpreted ar2.

Circuit break and rewritebase undesigned use

Circuit break needs either of:

  1. arg2 [of rewriterule] &rlhar; absolute. and htaccess parentfolder &rlhar; root.
  2. arg2 &rlhar; relative. and that folder not in interpreted arg2.

So when that folder ≠ root, circuit break needs arg2 &rlhar; relative. when arg2 &rlhar; relative, circuit break needs⎾that folder &rlhar; not in interpreted arg2⏋.

Say we need circuit break and a htaccess parentfolder that's in interpreted arg2, so we edit arg2 via rewritebase:

rewriteengine on
#rewriterule-: ar1 path =relative. ar2 if relative, that to rewritebase.
rewriterule ^ bar/index.php
#..circuit-breaking, so assert ar2 if absolute, htaccess parentfolder =root, else, htaccess parentfolder not in interpreted ar2.
rewritebase /foo
#..needs to be absolute [</> starting]. pathing [eg </../a/..> and </../..///ran/dom-f/oobar/./././../../..////////.//.>] allowed
Pacerier
  • 86,231
  • 106
  • 366
  • 634
2

I would recommend you to look into the Apache URL Rewriting Guide, it contains extensive information about rewriting with examples.

If I understand you correctly, you should be able to use something like this

RewriteEngine on
RewriteCond $1 
RewriteRule ^(.*)$ index.php/?page=$1 [L]

Which is very similar code to the one you posted. If you want better information, be specific about your problem.

Ragnar123
  • 5,174
  • 4
  • 24
  • 34
  • Thanks for the reply, glad to see I am on the right track. Any tips for urlencoding $1 on the last line? – veerman Feb 25 '11 at 21:26
  • @veerman, see [here](http://www.workingwith.me.uk/blog/software/open_source/apache/mod_rewriting_an_entire_site). – Ragnar123 Feb 25 '11 at 21:30
  • @veerman, As long as its a transparent redirect (ie not a redirect to a new domain), you don't have to bother passing `$1`. Just do a transparent redirect and you can read the path from your script using the usual ways: `echo $_SERVER['REQUEST_URI']`. C my ans. – Pacerier Oct 03 '17 at 05:51