7

I want to be able to validate a form to check if a website/webpage exists. If it returns a 404 error then that definitely shouldn't validate. If there is a redirect...I'm open to suggestions, sometimes redirects go to an error page or homepage, sometimes they go to the page you were looking for, so I don't know. Perhaps for a redirect there could be a special notice that suggests the destination address to the user.

The best thing I found so far was like this:

$.ajax({url: webpage ,type:'HEAD',error:function(){
    alert('No go.');
}});

That has no problem with 404's and 200's but if you do something like 'http://xyz' for the url it just hangs. Also 302 and the like trigger the error handler too.

This is a generic enough question I would like a complete working code example if somebody can make one. This could be handy for lots of people to use.

Yi Jiang
  • 49,435
  • 16
  • 136
  • 136
Moss
  • 3,695
  • 6
  • 40
  • 60
  • I'd like to know this as well. – esqew Aug 21 '10 at 03:13
  • By exist, you mean up and running, or just the domain. – Dejan Marjanović Aug 21 '10 at 03:24
  • Could you redesign the interaction so that the URL could be verified serverside without having the user wait for the it? – SargeATM Aug 21 '10 at 03:40
  • 1
    Note that you can also risk to get 405 Method Not Allowed (you can also read: not available/implemented) for a HEAD while a GET returns perfectly fine a 200. You may want to consider to test GET only. – BalusC Aug 21 '10 at 04:18
  • @Webarto the idea is to check if there is a legitimate page there so the form doesn't except "broken links". @SargeATM PHP is fine if that is the only/best way, I can always use jQuery to poll the php code. – Moss Aug 21 '10 at 09:14

4 Answers4

4

It sounds like you don't care about the web page's contents, you just want to see if it exists. Here's how I'd do it in PHP - I can stop PHP from taking up memory with the page's contents.

/*
 * Returns false if the page could not be retrieved (ie., no 2xx or 3xx HTTP
 * status code). On success, if $includeContents = false (default), then we
 * return true - if it's true, then we return file_get_contents()'s result (a
 * string of page content).
 */
function getURL($url, $includeContents = false)
{
  if($includeContents)
    return @file_get_contents($url);

  return (@file_get_contents($url, null, null, 0, 0) !== false);
}

For less verbosity, replace the above function's contents with this.

return ($includeContents) ? 
               @file_get_contents($url) :  
               (@file_get_contents($url, null, null, 0, 0) !== false)
;

See http://www.php.net/file_get_contents for details on how to specify HTTP headers using a stream context.

Cheers.

Sam Bisbee
  • 4,461
  • 20
  • 25
  • That is brilliantly simple and it works. Perhaps too simple since there is there is no way to handle the different codes if desired. But probably works for my current needs. – Moss Aug 21 '10 at 05:55
  • Your comment is very much appreciated. :-) If you want to get into switching on HTTP codes, then you'll either have to use a mechanism like cURL or sockets. Sockets are nice because they don't drag in another dependency, and cURL and be a pain to configure, but you don't get native HTTPS support with sockets. The above method does support HTTPS and many other protocols out of the box (http://us2.php.net/manual/en/wrappers.php). Cheers. – Sam Bisbee Aug 23 '10 at 17:22
1

First you need to check that the page exists via DNS. That's why you say it "just hangs" - it's waiting for the DNS query to time out. It's not actually hung.

After checking DNS, check that you can connect to the server. This is another long timeout if you're not careful.

Finally, perform the HTTP HEAD and check the status code. There are many, many, many special cases you have to consider here: what does a "temporary internal server error" mean for the page existing? What about "permanently moved"? Look into HTTP status codes.

Borealid
  • 95,191
  • 9
  • 106
  • 122
1

I've just written a simpler version using PHP:

function url_check($url) {

$x = @fopen($url,"r");

if ($x) {

         $reply = 1;

         fclose($x);

} else {

            $reply = 0;

}

return $reply;

}

Obviously $url is the test URL, returns true (1) or false (0) depending on URL existence.

Kirk Woll
  • 76,112
  • 22
  • 180
  • 195
Andy Ellis
  • 11
  • 1
0

Maybe you could combine domain checker, and jQuery, domain checker (PHP) can respond 1 or 0 for non-existent domains.

eg. http://webarto.com/snajper.php?domena=stackoverflow.com , will return 1, you can use input blur function to check for it instantly.

Dejan Marjanović
  • 19,244
  • 7
  • 52
  • 66