0

I'm attempting to detect broken links on a web page using JavaScript, and I've run into a problem. Is there any way to detect non-existent URLs using client-side JavaScript, as seen below?

function URLExists(theURL){
    //return true if the URL actually exists, and return false if it does not exist
}

//test different URLs to see if they exist
alert(URLExists("https://www.google.com/")); //should print the message "true";

alert(URLExists("http://www.i-made-this-url-up-and-it-doesnt-exist.com/")); //should print the message "false";
Anderson Green
  • 30,230
  • 67
  • 195
  • 328

1 Answers1

4

Due to Same Origin Policy, you would need to create a proxy on a server to access the site and send back its availability status - for example using curl:

<?PHP

$data = '{"error":"invalid call"}'; // json string
if (array_key_exists('url', $_GET)) {
  $url = $_GET['url'];
  $handle = curl_init($url);
  curl_setopt($handle,  CURLOPT_RETURNTRANSFER, TRUE);

  /* Get the HTML or whatever is linked in $url. */
  $response = curl_exec($handle);
  $httpCode = curl_getinfo($handle, CURLINFO_HTTP_CODE);
  curl_close($handle);

  $data = '{"status":"'.$httpCode.'"}';

  if (array_key_exists('callback', $_GET)) {

    header('Content-Type: text/javascript; charset=utf8');
    header('Access-Control-Allow-Origin: http://www.example.com/');
    header('Access-Control-Max-Age: 3628800');
    header('Access-Control-Allow-Methods: GET, POST, PUT, DELETE');

    $callback = $_GET['callback'];
    die($callback.'('.$data.');'); // 
  }
}
// normal JSON string
header('Content-Type: application/json; charset=utf8');
echo $data;

?>

Now you can ajax to that script with the URL you want to test and read the status returned, either as a JSON or JSONP call


The best client-only workaround I have found, is to load a site's logo or favicon and use onerror/onload but that does not tell us if a specific page is missing, only if the site is down or have removed their favicon/logo:

function isValidSite(url,div) {
  var img = new Image();
  img.onerror = function() { 
     document.getElementById(div).innerHTML='Site '+url+' does not exist or has no favicon.ico';
  } 
  img.onload = function() { 
    document.getElementById(div).innerHTML='Site '+url+' found';
  } 
  img.src=url+"favicon.ico";
}

isValidSite("http://google.com/","googleDiv")
mplungjan
  • 169,008
  • 28
  • 173
  • 236
  • How would you load the favicon/logo without knowing its path? And why would this not fall under SOP? – Bergi Jan 20 '13 at 09:27
  • 1
    Check the `Favicon method` in this answer http://stackoverflow.com/a/13664860/149636, @Bergi this workaround is possible as the tag does not fall under SOP restrictions – lostsource Jan 20 '13 at 09:33
  • @lostsource: It does, you cannot access its contents. Still, abusing the `error` event on images is not a reliable workaround :-) – Bergi Jan 20 '13 at 09:43
  • That is correct, it is not reliable, but better than nothing. A server process using curl or similar is the better option - see updates – mplungjan Jan 20 '13 at 13:10