230

How do I check if a URL exists (not 404) in PHP?

Alastair
  • 6,837
  • 4
  • 35
  • 29
X10nD
  • 21,638
  • 45
  • 111
  • 152
  • 5
    possible duplicate of [How can one check to see if a remote file exists using PHP?](http://stackoverflow.com/questions/981954/how-can-one-check-to-see-if-a-remote-file-exists-using-php) – viam0Zah Jul 06 '10 at 12:44
  • Note: Several servers do not send any headers back (empty header), so answers who rely on headers are not guaranteed to work. The site can still exist. – Avatar Feb 20 '23 at 14:46

22 Answers22

348

Here:

$file = 'http://www.example.com/somefile.jpg';
$file_headers = @get_headers($file);
if(!$file_headers || $file_headers[0] == 'HTTP/1.1 404 Not Found') {
    $exists = false;
}
else {
    $exists = true;
}

From here and right below the above post, there's a curl solution:

function url_exists($url) {
    return curl_init($url) !== false;
}
Theodore R. Smith
  • 21,848
  • 12
  • 65
  • 91
karim79
  • 339,989
  • 67
  • 413
  • 406
  • 20
    I'm afraid the CURL-way won't work this way. Check this out: http://stackoverflow.com/questions/981954/how-can-one-check-to-see-if-a-remote-file-exists-using-php/982045#982045 – viam0Zah Jul 06 '10 at 12:41
  • Should we close the filehande? – ekerner Apr 20 '11 at 10:53
  • 8
    some websites have a different `$file_headers[0]` on error page. for example, youtube.com. its error page having that value as `HTTP/1.0 404 Not Found`(difference is 1.0 and 1.1). what to do then? – Krishna Raj Mar 17 '12 at 07:34
  • 26
    Perhaps using `strpos($headers[0], '404 Not Found')` might do the trick – alexandru.topliceanu Apr 16 '12 at 07:11
  • This method did not work for me. I used @fopen which worked fine. – Patrick Savalle Mar 18 '13 at 10:52
  • @KrishnaRajK Instead of if($file_headers[0] == 'HTTP/1.1 404 Not Found') you would do if(($file_headers[0] == 'HTTP/1.1 404 Not Found') ||($file_headers[0] == 'HTTP/1.0 404 Not Found')) There may be an even simpler method, but this should get you going. – Malachi Jan 28 '14 at 14:41
  • 2
    @alexandru.topliceanu The "Not Found" text status is optional; developers can put whatever they want in there, it's still valid. – mpen Jan 23 '15 at 16:31
  • 17
    @Mark agreed! To clarify, `strpos($headers[0], '404')` is better! – alexandru.topliceanu Jan 26 '15 at 13:03
  • 1
    @karim79 be care from SSRF and XSPA attacks – M Rostami Mar 03 '15 at 22:25
  • Please fix the curl exemple ! – Korri Jul 14 '15 at 14:50
  • Do keep in mind that get_headers will return boolean false if an error occurs (eg. DNS not resolving). So be sure to first check if get_headers is not boolean false before testing if it contains HTTP response codes.. – 4levels Jul 31 '16 at 21:22
  • To save time and bandwidth, you could use HEAD request instead of GET with : stream_context_set_default( array( 'http' => array( 'method' => 'HEAD', 'timeout' => 1.5, 'ignore_errors' => true, ) ) ); – fred727 Aug 29 '16 at 14:30
  • 1
    Just a heads up: this won't work if your url gets redirected. This could easily happen if you try to check a url with a domain that redirects domain.tld requests to www.domain.tld. The first header in this case is 301 moved permanently or something similar and the 404 header is after all the redirect headers. – Splatbang Nov 28 '19 at 09:50
  • Use `if ($file_headers && strpos( $file_headers[0], '200')) {` to make sure there are no server errors. – milkovsky Feb 26 '20 at 11:26
  • 1
    Neither of these work. Rturns1 every time, regardless of whether the site exists or not. – WilliamK Sep 12 '20 at 23:03
  • what if the server is using HTTP/0.9? or returning [HTTP 410 Gone](https://developer.mozilla.org/en-US/docs/Web/HTTP/Status/410)? – hanshenrik Apr 02 '23 at 11:11
62

When figuring out if an url exists from php there are a few things to pay attention to:

  • Is the url itself valid (a string, not empty, good syntax), this is quick to check server side.
  • Waiting for a response might take time and block code execution.
  • Not all headers returned by get_headers() are well formed.
  • Use curl (if you can).
  • Prevent fetching the entire body/content, but only request the headers.
  • Consider redirecting urls:
  • Do you want the first code returned?
  • Or follow all redirects and return the last code?
  • You might end up with a 200, but it could redirect using meta tags or javascript. Figuring out what happens after is tough.

Keep in mind that whatever method you use, it takes time to wait for a response.
All code might (and probably will) halt untill you either know the result or the requests have timed out.

For example: the code below could take a LONG time to display the page if the urls are invalid or unreachable:

<?php
$urls = getUrls(); // some function getting say 10 or more external links

foreach($urls as $k=>$url){
  // this could potentially take 0-30 seconds each
  // (more or less depending on connection, target site, timeout settings...)
  if( ! isValidUrl($url) ){
    unset($urls[$k]);
  }
}

echo "yay all done! now show my site";
foreach($urls as $url){
  echo "<a href=\"{$url}\">{$url}</a><br/>";
}

The functions below could be helpfull, you probably want to modify them to suit your needs:

    function isValidUrl($url){
        // first do some quick sanity checks:
        if(!$url || !is_string($url)){
            return false;
        }
        // quick check url is roughly a valid http request: ( http://blah/... ) 
        if( ! preg_match('/^http(s)?:\/\/[a-z0-9-]+(\.[a-z0-9-]+)*(:[0-9]+)?(\/.*)?$/i', $url) ){
            return false;
        }
        // the next bit could be slow:
        if(getHttpResponseCode_using_curl($url) != 200){
//      if(getHttpResponseCode_using_getheaders($url) != 200){  // use this one if you cant use curl
            return false;
        }
        // all good!
        return true;
    }
    
    function getHttpResponseCode_using_curl($url, $followredirects = true){
        // returns int responsecode, or false (if url does not exist or connection timeout occurs)
        // NOTE: could potentially take up to 0-30 seconds , blocking further code execution (more or less depending on connection, target site, and local timeout settings))
        // if $followredirects == false: return the FIRST known httpcode (ignore redirects)
        // if $followredirects == true : return the LAST  known httpcode (when redirected)
        if(! $url || ! is_string($url)){
            return false;
        }
        $ch = @curl_init($url);
        if($ch === false){
            return false;
        }
        @curl_setopt($ch, CURLOPT_HEADER         ,true);    // we want headers
        @curl_setopt($ch, CURLOPT_NOBODY         ,true);    // dont need body
        @curl_setopt($ch, CURLOPT_RETURNTRANSFER ,true);    // catch output (do NOT print!)
        if($followredirects){
            @curl_setopt($ch, CURLOPT_FOLLOWLOCATION ,true);
            @curl_setopt($ch, CURLOPT_MAXREDIRS      ,10);  // fairly random number, but could prevent unwanted endless redirects with followlocation=true
        }else{
            @curl_setopt($ch, CURLOPT_FOLLOWLOCATION ,false);
        }
//      @curl_setopt($ch, CURLOPT_CONNECTTIMEOUT ,5);   // fairly random number (seconds)... but could prevent waiting forever to get a result
//      @curl_setopt($ch, CURLOPT_TIMEOUT        ,6);   // fairly random number (seconds)... but could prevent waiting forever to get a result
//      @curl_setopt($ch, CURLOPT_USERAGENT      ,"Mozilla/5.0 (Windows NT 6.0) AppleWebKit/537.1 (KHTML, like Gecko) Chrome/21.0.1180.89 Safari/537.1");   // pretend we're a regular browser
        @curl_exec($ch);
        if(@curl_errno($ch)){   // should be 0
            @curl_close($ch);
            return false;
        }
        $code = @curl_getinfo($ch, CURLINFO_HTTP_CODE); // note: php.net documentation shows this returns a string, but really it returns an int
        @curl_close($ch);
        return $code;
    }
    
    function getHttpResponseCode_using_getheaders($url, $followredirects = true){
        // returns string responsecode, or false if no responsecode found in headers (or url does not exist)
        // NOTE: could potentially take up to 0-30 seconds , blocking further code execution (more or less depending on connection, target site, and local timeout settings))
        // if $followredirects == false: return the FIRST known httpcode (ignore redirects)
        // if $followredirects == true : return the LAST  known httpcode (when redirected)
        if(! $url || ! is_string($url)){
            return false;
        }
        $headers = @get_headers($url);
        if($headers && is_array($headers)){
            if($followredirects){
                // we want the last errorcode, reverse array so we start at the end:
                $headers = array_reverse($headers);
            }
            foreach($headers as $hline){
                // search for things like "HTTP/1.1 200 OK" , "HTTP/1.0 200 OK" , "HTTP/1.1 301 PERMANENTLY MOVED" , "HTTP/1.1 400 Not Found" , etc.
                // note that the exact syntax/version/output differs, so there is some string magic involved here
                if(preg_match('/^HTTP\/\S+\s+([1-9][0-9][0-9])\s+.*/', $hline, $matches) ){// "HTTP/*** ### ***"
                    $code = $matches[1];
                    return $code;
                }
            }
            // no HTTP/xxx found in headers:
            return false;
        }
        // no headers :
        return false;
    }
HoldOffHunger
  • 18,769
  • 10
  • 104
  • 133
MoonLite
  • 4,981
  • 2
  • 21
  • 13
  • for some reason getHttpResponseCode_using_curl() always returns 200 in my case. – Nijboer IT Apr 17 '14 at 08:00
  • 2
    if someone has the same problem, check dns-nameservers.. use opendns with no followredirects http://stackoverflow.com/a/11072947/1829460 – Nijboer IT Apr 17 '14 at 10:37
  • +1 for being the only answer to deal with redirects. Changed the `return $code` to `if($code == 200){return true;} return false;` to sort out only successes – Birrel Apr 10 '16 at 21:08
  • @PKHunter : No. My quick preg_match regex was a simple example and will not match all the urls listed in there. See this test url: https://regex101.com/r/EpyDDc/2/ If you want a better one, replace it with the one listed on your link ( https://mathiasbynens.be/demo/url-regex ) from diegoperini ; it seems to match all of them, see this testlink: https://regex101.com/r/qMQp23/1 – MoonLite Mar 30 '17 at 12:21
  • Finding a lot of valid URLs are returning an CURL error 60 on exec. "SSL certificate problem: unable to get local issuer certificate" – xtempore Nov 03 '21 at 01:31
  • @xtempore : I don't think your issue is related to my code. One of the urls you test is probably a "https://..." url of an API or something, and curl can not resolve the given certificate for that domain as specified by the https protocol. You probably need to download the https ssl certificate from that API domain server, save that somewhere, and supply the full path of that file to curl, e.g. as such: `curl_setopt($curl, CURLOPT_CAINFO, '\absolute\path\to\some\certificate\cert.pem');` (usually it is a .pem file) – MoonLite Nov 05 '21 at 19:39
  • this seems to work but takes forevvverrrrrr – ina Dec 12 '21 at 10:16
50
$headers = @get_headers($this->_value);
if(strpos($headers[0],'200')===false)return false;

so anytime you contact a website and get something else than 200 ok it will work

Somnath Muluk
  • 55,015
  • 38
  • 216
  • 226
lunarnet76
  • 666
  • 5
  • 8
  • 14
    But what if it's a redirect? The domain is still valid, but will be left out. – Eric Leroy Oct 19 '13 at 17:06
  • 4
    Above on one line: `return strpos(@get_headers($url)[0],'200') === false ? false : true`. Might be useful. – Dejv Mar 04 '15 at 13:41
  • $this is in PHP is a reference to the current object. Reference: http://www.php.net/manual/en/language.oop5.basic.php Primer: http://www.phpro.org/tutorials/Object-Oriented-Programming-with-PHP.html Most likely the code snippet was taken from a class and not fixed accordingly. – Marc Witteveen Apr 23 '16 at 21:09
  • Improving Dejv's comment -> return strpos(@get_headers($url)[0],'200'); – Andres Paul Nov 20 '20 at 12:51
  • There is a lot of success response codes, not only 200... – JuliSmz Jul 22 '21 at 13:49
21

you cannot use curl in certain servers u can use this code

<?php
$url = 'http://www.example.com';
$array = get_headers($url);
$string = $array[0];
if(strpos($string,"200"))
  {
    echo 'url exists';
  }
  else
  {
    echo 'url does not exist';
  }
?>
Minhaz
  • 937
  • 1
  • 10
  • 25
10

I use this function:

/**
 * @param $url
 * @param array $options
 * @return string
 * @throws Exception
 */
function checkURL($url, array $options = array()) {
    if (empty($url)) {
        throw new Exception('URL is empty');
    }

    // list of HTTP status codes
    $httpStatusCodes = array(
        100 => 'Continue',
        101 => 'Switching Protocols',
        102 => 'Processing',
        200 => 'OK',
        201 => 'Created',
        202 => 'Accepted',
        203 => 'Non-Authoritative Information',
        204 => 'No Content',
        205 => 'Reset Content',
        206 => 'Partial Content',
        207 => 'Multi-Status',
        208 => 'Already Reported',
        226 => 'IM Used',
        300 => 'Multiple Choices',
        301 => 'Moved Permanently',
        302 => 'Found',
        303 => 'See Other',
        304 => 'Not Modified',
        305 => 'Use Proxy',
        306 => 'Switch Proxy',
        307 => 'Temporary Redirect',
        308 => 'Permanent Redirect',
        400 => 'Bad Request',
        401 => 'Unauthorized',
        402 => 'Payment Required',
        403 => 'Forbidden',
        404 => 'Not Found',
        405 => 'Method Not Allowed',
        406 => 'Not Acceptable',
        407 => 'Proxy Authentication Required',
        408 => 'Request Timeout',
        409 => 'Conflict',
        410 => 'Gone',
        411 => 'Length Required',
        412 => 'Precondition Failed',
        413 => 'Payload Too Large',
        414 => 'Request-URI Too Long',
        415 => 'Unsupported Media Type',
        416 => 'Requested Range Not Satisfiable',
        417 => 'Expectation Failed',
        418 => 'I\'m a teapot',
        422 => 'Unprocessable Entity',
        423 => 'Locked',
        424 => 'Failed Dependency',
        425 => 'Unordered Collection',
        426 => 'Upgrade Required',
        428 => 'Precondition Required',
        429 => 'Too Many Requests',
        431 => 'Request Header Fields Too Large',
        449 => 'Retry With',
        450 => 'Blocked by Windows Parental Controls',
        500 => 'Internal Server Error',
        501 => 'Not Implemented',
        502 => 'Bad Gateway',
        503 => 'Service Unavailable',
        504 => 'Gateway Timeout',
        505 => 'HTTP Version Not Supported',
        506 => 'Variant Also Negotiates',
        507 => 'Insufficient Storage',
        508 => 'Loop Detected',
        509 => 'Bandwidth Limit Exceeded',
        510 => 'Not Extended',
        511 => 'Network Authentication Required',
        599 => 'Network Connect Timeout Error'
    );

    $ch = curl_init($url);
    curl_setopt($ch, CURLOPT_NOBODY, true);
    curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);

    if (isset($options['timeout'])) {
        $timeout = (int) $options['timeout'];
        curl_setopt($ch, CURLOPT_TIMEOUT, $timeout);
    }

    curl_exec($ch);
    $returnedStatusCode = curl_getinfo($ch, CURLINFO_HTTP_CODE);
    curl_close($ch);

    if (array_key_exists($returnedStatusCode, $httpStatusCodes)) {
        return "URL: '{$url}' - Error code: {$returnedStatusCode} - Definition: {$httpStatusCodes[$returnedStatusCode]}";
    } else {
        return "'{$url}' does not exist";
    }
}
Ehsan
  • 1,022
  • 11
  • 20
9
function URLIsValid($URL)
{
    $exists = true;
    $file_headers = @get_headers($URL);
    $InvalidHeaders = array('404', '403', '500');
    foreach($InvalidHeaders as $HeaderVal)
    {
            if(strstr($file_headers[0], $HeaderVal))
            {
                    $exists = false;
                    break;
            }
    }
    return $exists;
}
leela
  • 555
  • 4
  • 11
  • The php manual advises against using `strstr()` to check the existence of a substring -- it encourages the use of `strpos()`. – mickmackusa Mar 06 '21 at 03:49
8
$url = 'http://google.com';
$not_url = 'stp://google.com';

if (@file_get_contents($url)): echo "Found '$url'!";
else: echo "Can't find '$url'.";
endif;
if (@file_get_contents($not_url)): echo "Found '$not_url!";
else: echo "Can't find '$not_url'.";
endif;

// Found 'http://google.com'!Can't find 'stp://google.com'.
Randy Skretka
  • 3,488
  • 3
  • 22
  • 14
5
function urlIsOk($url)
{
    $headers = @get_headers($url);
    $httpStatus = intval(substr($headers[0], 9, 3));
    if ($httpStatus<400)
    {
        return true;
    }
    return false;
}
Spir
  • 1,709
  • 1
  • 16
  • 27
  • 1
    maybe the server is using HTTP/1.11? or some http version with 3+ digits? it's safer to use `$httpStatus = intval(explode(" ", $header[0], 2)[1]);` (-: – hanshenrik Apr 02 '23 at 11:17
5

karim79's get_headers() solution didn't worked for me as I gotten crazy results with Pinterest.

get_headers(): SSL operation failed with code 1. OpenSSL Error messages: error:14090086:SSL routines:SSL3_GET_SERVER_CERTIFICATE:certificate verify failed

Array
(
    [url] => https://www.pinterest.com/jonathan_parl/
    [exists] => 
)

get_headers(): Failed to enable crypto

Array
(
    [url] => https://www.pinterest.com/jonathan_parl/
    [exists] => 
)

get_headers(https://www.pinterest.com/jonathan_parl/): failed to open stream: operation failed

Array
(
    [url] => https://www.pinterest.com/jonathan_parl/
    [exists] => 
) 

Anyway, this developer demonstrates that cURL is way faster than get_headers():

http://php.net/manual/fr/function.get-headers.php#104723

Since many people asked for karim79 to fix is cURL solution, here's the solution I built today.

/**
* Send an HTTP request to a the $url and check the header posted back.
*
* @param $url String url to which we must send the request.
* @param $failCodeList Int array list of code for which the page is considered invalid.
*
* @return Boolean
*/
public static function isUrlExists($url, array $failCodeList = array(404)){

    $exists = false;

    if(!StringManager::stringStartWith($url, "http") and !StringManager::stringStartWith($url, "ftp")){

        $url = "https://" . $url;
    }

    if (preg_match(RegularExpression::URL, $url)){

        $handle = curl_init($url);


        curl_setopt($handle, CURLOPT_RETURNTRANSFER, true);

        curl_setopt($handle, CURLOPT_SSL_VERIFYPEER, false);

        curl_setopt($handle, CURLOPT_HEADER, true);

        curl_setopt($handle, CURLOPT_NOBODY, true);

        curl_setopt($handle, CURLOPT_USERAGENT, true);


        $headers = curl_exec($handle);

        curl_close($handle);


        if (empty($failCodeList) or !is_array($failCodeList)){

            $failCodeList = array(404); 
        }

        if (!empty($headers)){

            $exists = true;

            $headers = explode(PHP_EOL, $headers);

            foreach($failCodeList as $code){

                if (is_numeric($code) and strpos($headers[0], strval($code)) !== false){

                    $exists = false;

                    break;  
                }
            }
        }
    }

    return $exists;
}

Let me explains the curl options:

CURLOPT_RETURNTRANSFER: return a string instead of displaying the calling page on the screen.

CURLOPT_SSL_VERIFYPEER: cUrl won't checkout the certificate

CURLOPT_HEADER: include the header in the string

CURLOPT_NOBODY: don't include the body in the string

CURLOPT_USERAGENT: some site needs that to function properly (by example : https://plus.google.com)


Additional note: In this function I'm using Diego Perini's regex for validating the URL before sending the request:

const URL = "%^(?:(?:https?|ftp)://)(?:\S+(?::\S*)?@|\d{1,3}(?:\.\d{1,3}){3}|(?:(?:[a-z\d\x{00a1}-\x{ffff}]+-?)*[a-z\d\x{00a1}-\x{ffff}]+)(?:\.(?:[a-z\d\x{00a1}-\x{ffff}]+-?)*[a-z\d\x{00a1}-\x{ffff}]+)*(?:\.[a-z\x{00a1}-\x{ffff}]{2,6}))(?::\d+)?(?:[^\s]*)?$%iu"; //@copyright Diego Perini

Additional note 2: I explode the header string and user headers[0] to be sure to only validate only the return code and message (example: 200, 404, 405, etc.)

Additional note 3: Sometime validating only the code 404 is not enough (see the unit test), so there's an optional $failCodeList parameter to supply all the code list to reject.

And, of course, here's the unit test (including all the popular social network) to legitimates my coding:

public function testIsUrlExists(){

//invalid
$this->assertFalse(ToolManager::isUrlExists("woot"));

$this->assertFalse(ToolManager::isUrlExists("https://www.facebook.com/jonathan.parentlevesque4545646456"));

$this->assertFalse(ToolManager::isUrlExists("https://plus.google.com/+JonathanParentL%C3%A9vesque890800"));

$this->assertFalse(ToolManager::isUrlExists("https://instagram.com/mariloubiz1232132/", array(404, 405)));

$this->assertFalse(ToolManager::isUrlExists("https://www.pinterest.com/jonathan_parl1231/"));

$this->assertFalse(ToolManager::isUrlExists("https://regex101.com/546465465456"));

$this->assertFalse(ToolManager::isUrlExists("https://twitter.com/arcadefire4566546"));

$this->assertFalse(ToolManager::isUrlExists("https://vimeo.com/**($%?%$", array(400, 405)));

$this->assertFalse(ToolManager::isUrlExists("https://www.youtube.com/user/Darkjo666456456456"));


//valid
$this->assertTrue(ToolManager::isUrlExists("www.google.ca"));

$this->assertTrue(ToolManager::isUrlExists("https://www.facebook.com/jonathan.parentlevesque"));

$this->assertTrue(ToolManager::isUrlExists("https://plus.google.com/+JonathanParentL%C3%A9vesque"));

$this->assertTrue(ToolManager::isUrlExists("https://instagram.com/mariloubiz/"));

$this->assertTrue(ToolManager::isUrlExists("https://www.facebook.com/jonathan.parentlevesque"));

$this->assertTrue(ToolManager::isUrlExists("https://www.pinterest.com/"));

$this->assertTrue(ToolManager::isUrlExists("https://regex101.com"));

$this->assertTrue(ToolManager::isUrlExists("https://twitter.com/arcadefire"));

$this->assertTrue(ToolManager::isUrlExists("https://vimeo.com/"));

$this->assertTrue(ToolManager::isUrlExists("https://www.youtube.com/user/Darkjo666"));
}

Great success to all,

Jonathan Parent-Lévesque from Montreal

4

I run some tests to see if links on my site are valid - alerts me to when third parties change their links. I was having an issue with a site that had a poorly configured certificate that meant that php's get_headers didn't work.

SO, I read that curl was faster and decided to give that a go. then i had an issue with linkedin which gave me a 999 error, which turned out to be a user agent issue.

I didn't care if the certificate was not valid for this test, and i didn't care if the response was a re-direct.

Then I figured use get_headers anyway if curl was failing....

Give it a go....

/**
 * returns true/false if the $url is valid.
 *
 * @param string $url assumes this is a valid url.
 *
 * @return bool
 */
private function urlExists(string $url): bool
{
    $ch = curl_init($url);
    curl_setopt($ch, CURLOPT_URL, $url);
    curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);     // do not output response in stdout
    curl_setopt($ch, CURLOPT_NOBODY, true);             // this does a head request to make it faster.
    curl_setopt($ch, CURLOPT_HEADER, true);             // just the headers
    curl_setopt($ch, CURLOPT_SSL_VERIFYSTATUS, false);  // turn off that pesky ssl stuff - some sys admins can't get it right.
    curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false);
    // set a real user agent to stop linkedin getting upset.
    curl_setopt($ch, CURLOPT_USERAGENT, 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.77 Safari/537.36');
    curl_exec($ch);
    $http_code = curl_getinfo($ch, CURLINFO_HTTP_CODE);
    if (($http_code >= 200 && $http_code < 400) || $http_code === 999) {
        curl_close($ch);
        return true;
    }
    //$error = curl_error($ch); // used for debugging.
    curl_close($ch);

    // just try the get_headers - it might work!
    stream_context_set_default(
        ['http' => ['method' => 'HEAD']]
    );
    $file_headers = @get_headers($url);

    if ($file_headers !== false) {
        $response_code = substr($file_headers[0], 9, 3);
        return $response_code >= 200 && $response_code < 400;
    }

    return false;
}
William Desportes
  • 1,412
  • 1
  • 22
  • 31
pgee70
  • 3,707
  • 4
  • 35
  • 41
3

pretty fast:

function http_response($url){
    $resURL = curl_init(); 
    curl_setopt($resURL, CURLOPT_URL, $url); 
    curl_setopt($resURL, CURLOPT_BINARYTRANSFER, 1); 
    curl_setopt($resURL, CURLOPT_HEADERFUNCTION, 'curlHeaderCallback'); 
    curl_setopt($resURL, CURLOPT_FAILONERROR, 1); 
    curl_exec ($resURL); 
    $intReturnCode = curl_getinfo($resURL, CURLINFO_HTTP_CODE); 
    curl_close ($resURL); 
    if ($intReturnCode != 200 && $intReturnCode != 302 && $intReturnCode != 304) { return 0; } else return 1;
}

echo 'google:';
echo http_response('http://www.google.com');
echo '/ ogogle:';
echo http_response('http://www.ogogle.com');
  • Way too complicated :) http://stackoverflow.com/questions/981954/how-can-one-check-to-see-if-a-remote-file-exists-using-php/982045#982045 – Ja͢ck May 16 '12 at 13:52
  • i get this exceptionn when the url exists: Could not call the CURLOPT_HEADERFUNCTION – safiot Sep 29 '12 at 13:14
3

Here is a solution that reads only the first byte of source code... returning false if the file_get_contents fails... This will also work for remote files like images.

 function urlExists($url)
{
    if (@file_get_contents($url,false,NULL,0,1))
    {
        return true;
    }
    return false;
}
Daniel Valland
  • 1,057
  • 4
  • 21
  • 45
3

All above solutions + extra sugar. (Ultimate AIO solution)

/**
 * Check that given URL is valid and exists.
 * @param string $url URL to check
 * @return bool TRUE when valid | FALSE anyway
 */
function urlExists ( $url ) {
    // Remove all illegal characters from a url
    $url = filter_var($url, FILTER_SANITIZE_URL);

    // Validate URI
    if (filter_var($url, FILTER_VALIDATE_URL) === FALSE
        // check only for http/https schemes.
        || !in_array(strtolower(parse_url($url, PHP_URL_SCHEME)), ['http','https'], true )
    ) {
        return false;
    }

    // Check that URL exists
    $file_headers = @get_headers($url);
    return !(!$file_headers || $file_headers[0] === 'HTTP/1.1 404 Not Found');
}

Example:

var_dump ( urlExists('http://stackoverflow.com/') );
// Output: true;
Junaid Atari
  • 533
  • 7
  • 17
3

to check if url is online or offline ---

function get_http_response_code($theURL) {
    $headers = @get_headers($theURL);
    return substr($headers[0], 9, 3);
}
xpredo
  • 1,282
  • 17
  • 27
3
function url_exists($url) {
    $headers = @get_headers($url);
    return (strpos($headers[0],'200')===false)? false:true;
}
  • you'd want to implode those headers as other answers noted, the response code can be in a later header – Sandra Aug 16 '22 at 16:21
3

cURL can return HTTP code I don’t think all that extra code is necessary?

function urlExists($url=NULL)
    {
        if($url == NULL) return false;
        $ch = curl_init($url);
        curl_setopt($ch, CURLOPT_TIMEOUT, 5);
        curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, 5);
        curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
        $data = curl_exec($ch);
        $httpcode = curl_getinfo($ch, CURLINFO_HTTP_CODE);
        curl_close($ch); 
        if($httpcode>=200 && $httpcode<300){
            return true;
        } else {
            return false;
        }
    }
Arun Vitto
  • 163
  • 10
1

Other way to check if a URL is valid or not can be:

<?php

  if (isValidURL("http://www.gimepix.com")) {
      echo "URL is valid...";
  } else {
      echo "URL is not valid...";
  }

  function isValidURL($url) {
      $file_headers = @get_headers($url);
      if (strpos($file_headers[0], "200 OK") > 0) {
         return true;
      } else {
        return false;
      }
  }
?>
Nisse Engström
  • 4,738
  • 23
  • 27
  • 42
1

One thing to take into consideration when you check the header for a 404 is the case where a site does not generate a 404 immediately.

A lot of sites check whether a page exists or not in the PHP/ASP (et cetera) source and forward you to a 404 page. In those cases the header is basically extended by the header of the 404 that is generated. In those cases the 404 error not in the first line of the header, but the tenth.

$array = get_headers($url);
$string = $array[0];
print_r($string) // would generate:

Array ( 
[0] => HTTP/1.0 301 Moved Permanently 
[1] => Date: Fri, 09 Nov 2018 16:12:29 GMT 
[2] => Server: Apache/2.4.34 (FreeBSD) LibreSSL/2.7.4 PHP/7.0.31 
[3] => X-Powered-By: PHP/7.0.31 
[4] => Set-Cookie: landing=%2Freed-diffuser-fig-pudding-50; path=/; HttpOnly 
[5] => Location: /reed-diffuser-fig-pudding-50/ 
[6] => Content-Length: 0 
[7] => Connection: close 
[8] => Content-Type: text/html; charset=utf-8 
[9] => HTTP/1.0 404 Not Found 
[10] => Date: Fri, 09 Nov 2018 16:12:29 GMT 
[11] => Server: Apache/2.4.34 (FreeBSD) LibreSSL/2.7.4 PHP/7.0.31 
[12] => X-Powered-By: PHP/7.0.31 
[13] => Set-Cookie: landing=%2Freed-diffuser-fig-pudding-50%2F; path=/; HttpOnly 
[14] => Connection: close 
[15] => Content-Type: text/html; charset=utf-8 
) 
Lexib0y
  • 519
  • 10
  • 27
0

the simple way is curl (and FASTER too)

<?php
$mylinks="http://site.com/page.html";
$handlerr = curl_init($mylinks);
curl_setopt($handlerr,  CURLOPT_RETURNTRANSFER, TRUE);
$resp = curl_exec($handlerr);
$ht = curl_getinfo($handlerr, CURLINFO_HTTP_CODE);


if ($ht == '404')
     { echo 'OK';}
else { echo 'NO';}

?>
T.Todua
  • 53,146
  • 19
  • 236
  • 237
0

get_headers() returns an array with the headers sent by the server in response to a HTTP request.

$image_path = 'https://your-domain.com/assets/img/image.jpg';

$file_headers = @get_headers($image_path);
//Prints the response out in an array
//print_r($file_headers); 

if($file_headers[0] == 'HTTP/1.1 404 Not Found'){
   echo 'Failed because path does not exist.</br>';
}else{
   echo 'It works. Your good to go!</br>';
}
Jeacovy Gayle
  • 447
  • 9
  • 9
0

The best and simplest answer so far using get_headers() The best thing to check for string "200 ok". its far better than to check

$file_headers = @get_headers($file-path);
$file_headers[0];

because sometime the array key numbers varies. so best thing is to check for "200 ok". Any URL which is up will have "200 ok" anywhere in get_headers() response.

function url_exist($url) {
        $urlheaders = get_headers($url);
        //print_r($urlheaders);
        $urlmatches  = preg_grep('/200 ok/i', $urlheaders);
         if(!empty($urlmatches)){
           return true;
         }else{
           return false;
         }
}

now check the function if true or false

if(url_exist(php-url-variable-here)
  URL exist
}else{
  URL don't exist
}
  • don't need `preg_grep` if you're literally checking for a string - `stripos` should be sufficient - eg `return ( stripos( '200', implode(",",$urlheaders) ) !== false );` – Sandra Aug 16 '22 at 16:19
-1

kind of an old thread, but.. i do this:

$file = 'http://www.google.com';
$file_headers = @get_headers($file);
if ($file_headers) {
    $exists = true;
} else {
    $exists = false;
}