0

I have read many question regarding the title. Basically I'm using combination of getheader and curl to check wether a url is exist.

$url = "http://www.asdkkk.com";
$headers = get_headers($url);  

if(strpos($headers[0],'404') === false){

    $ch = curl_init($url); 
    curl_setopt_array($ch,array(
                            CURLOPT_HEADER => true,
                            CURLOPT_RETURNTRANSFER => true,
                            CURLOPT_FOLLOWLOCATION => true,
                            CURLOPT_SSL_VERIFYPEER => false,
                            CURLOPT_HTTPHEADER     => array("Accept-Language: en-US;q=0.6,en;q=0.4"),
                            CURLOPT_USERAGENT => 'Mozilla/5.0 (Windows NT 5.1) AppleWebKit/535.6 (KHTML, like Gecko) Chrome/16.0.897.0 Safari/535.6'  
                           ));
    $data = curl_exec($ch); 
    $httpCode = curl_getinfo($ch, CURLINFO_HTTP_CODE);
    if($httpCode != 404){
        curl_close($ch); 
        return $data;
    }
}else{
  echo "URL Not Exists";
}

Both function will return status code 200 for the url("http://www.asdkkk.com"). In the url is a page not found website. But it seem like it is hosted and the header of the page doesn't set to 404. I have try out not only this url but others too. So how can I determine a URL is actually existence in a very accurate way?

delto2u
  • 51
  • 7
  • possible duplicate of [How can I check if a URL exists via PHP?](http://stackoverflow.com/questions/2280394/how-can-i-check-if-a-url-exists-via-php) – castis Dec 08 '14 at 19:45
  • It wasn't I have read this question before. @castis – delto2u Dec 08 '14 at 19:47
  • 2
    If the website displays a "404" message even when it serves up a response code of 200, then it is website that is not behaving properly. You might need to actually parse the response content itself to determine if it is a "404". – Mike Brant Dec 08 '14 at 19:48
  • What now I mean is the URL no matter what will eventually return 200 in a code – delto2u Dec 08 '14 at 19:48
  • @MikeBrant can you give me an example? Or some sort of article,question? And thank for help =D – delto2u Dec 08 '14 at 19:50
  • Returning a 200 instead of a 404 is fundamentally wrong. "These are not the droids you are looking for". – Marc B Dec 08 '14 at 19:52
  • @delto2u The answer would be VERY specific to the broken website in question. You would perhaps need to understand what HTML element the 404 message is displayed in and traverse the DOM to find/evaluate that element. – Mike Brant Dec 08 '14 at 19:54

1 Answers1

1

I think the issue with your example code is you are confusing a 404 HTTP response code for 'Not Found' from a server with the case of a URL that doesn't point to any server at all. If there's no server response at all, cURL will return '0' as the HTTP response, rather than 404. Try running the below code and see if it works for your purposes:

$urls = array(
    "http://www.asdkkk.com",
    "http://www.google.com/cantfindthisurl",
    "http://www.google.com",
);
$ch = curl_init();
foreach($urls as $url){
    curl_setopt($ch, CURLOPT_URL, $url);
    curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
    curl_exec($ch);
    $http_status = curl_getinfo($ch, CURLINFO_HTTP_CODE);
    echo "$http_status for $url <br>";
}
Glen
  • 889
  • 7
  • 13
  • Kindly note the CURLOPT_SSL_VERIFYPEER option which also verify the URL's starting with HTTPS, so `curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false);` – luke_mclachlan Apr 06 '16 at 13:58