Well, we won't be able to give you a finite answer but only pointers. I've done a search engine once out of php so the principle will be the same:
- First of all you need to code your script as a console script, a web script is not really appropriate but it's all a question of tastes
- You need to understand how to work with sockets in PHP and make requests, look at the php socket library at: http://www.php.net/manual/ref.network.php
- You will need to get versed in the world of HTTP requests, learn how to make your own GET/POST requests and split the headers from the returned content.
- Last part will be easy with regexp, just preg_match the content for "#()*#i" (the last expression might be wrong, i didn't test it at all ok?)
- Loop the list of found hrefs, compare to already visited hrefs (remember to take into account wildcard GET params in your stuff) and then repeat the process to load all the pages of a site.
It IS HARD WORK... good luck