189

I want to use PHP to check, if string stored in $myoutput variable contains a valid link syntax or is it just a normal text. The function or solution, that I'm looking for, should recognize all links formats including the ones with GET parameters.

A solution, suggested on many sites, to actually query string (using CURL or file_get_contents() function) is not possible in my case and I would like to avoid it.

I thought about regular expressions or another solution.

trejder
  • 17,148
  • 27
  • 124
  • 216
CodeOverload
  • 47,274
  • 54
  • 131
  • 219
  • Using CURL or getting it's HTTP contents may be slow, if you want something more speedy and almost as reliable, consider using gethostbyaddr() on the hostname. If it resolves to an IP, then it probably has a website. Of course this depends on your needs. – TravisO Jan 13 '10 at 18:28
  • 1
    I would be interested in the use case for this. – Sybille Peters Jun 26 '21 at 06:47

13 Answers13

395

You can use a native Filter Validator

filter_var($url, FILTER_VALIDATE_URL);

Validates value as URL (according to » http://www.faqs.org/rfcs/rfc2396), optionally with required components. Beware a valid URL may not specify the HTTP protocol http:// so further validation may be required to determine the URL uses an expected protocol, e.g. ssh:// or mailto:. Note that the function will only find ASCII URLs to be valid; internationalized domain names (containing non-ASCII characters) will fail.

Example:

if (filter_var($url, FILTER_VALIDATE_URL) === FALSE) {
    die('Not a valid URL');
}
Gordon
  • 312,688
  • 75
  • 539
  • 559
  • 9
    @Raveren expected behavior since these are valid URLs. – Gordon Oct 14 '11 at 13:38
  • 20
    Be aware that `FILTER_VALIDATE_URL` will not validate the protocol of a url. So `ssh://`, `ftp://` etc will pass. – Seph May 10 '14 at 14:03
  • 6
    @SephVelut expected behavior since these are valid URLs. – Gordon May 10 '14 at 16:10
  • 1
    @Gordon Still important to point out the caveat. http/https is blankly seen as the de-facto aspect of url's. – Seph May 11 '14 at 01:39
  • 1
    it allows urls like ttp://amazon.com – Elia Weiss Jan 13 '16 at 10:00
  • 1
    @Joti which is a syntactically valid url. See Section 3.1 in http://www.faqs.org/rfcs/rfc2396.html and http://php.net/manual/en/filter.filters.validate.php – Gordon Jan 13 '16 at 10:19
  • @Joti This doesn't allow for relative urls either (like `//example.com/asset.png`) even though it allows things like the the partial http strings you found. – dhaupin May 13 '16 at 20:19
  • It's this kind of stuff which is making me fall in lurve with teh php. – vhs Jun 15 '17 at 21:54
  • 1
    @dhaupin It isn't a relative URL. It is a relative protocol URL. The browser use the same protocol (http(s)) used in the URL that load such content. – Jose Nobile Aug 14 '17 at 18:07
  • 6
    @JoshHabdas, I think you're missing the point. The PHP code does exactly what it claims to do. But it can't read your mind. There's a huge difference between invalid and unwanted.. Unwanted is very subjective, which is why it's left to the programmer to work out that detail. You might also note the code validates the URL, but doesn't prove it exists. It's not PHP's fault that a user mistyped "amazon," "amozon," which would validate, but is still unwanted. – JBH Mar 22 '18 at 19:15
  • This function claims http://127.0.0.1:1234:1234 (with double port numbers) is a valid URL. – kbriggs Apr 20 '18 at 21:04
  • @kbriggs it does on PHP 5.2, which is likely a bug. But PHP 5.2 is end of life for several years now and you shouldn't be using it. From PHP 5.2.1 it correctly reports false: https://3v4l.org/XE8ud – Gordon Apr 23 '18 at 06:05
  • @Gordon you are correct. I had run that test on an old local 5.2.8 copy. Trying it again on my production server running 5.6.33 it performs as expected now. – kbriggs Apr 24 '18 at 01:16
  • return false in case valid url that contains non english character – Usama Oct 14 '18 at 13:48
  • Yes, not a perfect solution. Accents are not accepted, and no validation for correct domain. Better use regex. – Medhi Dec 02 '19 at 13:12
  • 1
    Folks, do not use FILTER_VALIDATE_URL. It is messy and unreliable. E.g. it validates `ttps://www.youtube.com` as valid – Jeffz May 17 '20 at 13:23
  • 3
    @Jeffz ttps://www.youtube.com *is* a syntactically valid URL. Mind the quote in the answer. – Gordon May 19 '20 at 15:57
  • 1
    @Gordon No offence, but despite being 'syntactically' valid, it does not work and some coders may end up with a headache. You added some exception in body of answer to warn folks, I added additional, just to make a point for some unaware, that FILTER_VALIDATE_URL is not a silver bullet and maybe coder should also consider using it together with: FILTER_FLAG_SCHEME_REQUIRED, or FILTER_FLAG_HOST_REQUIRED ... or maybe both. Although I probably should have phrased my comment a bit differently. – Jeffz May 21 '20 at 04:08
  • To avoid a SSRF attack you can use: `if (false == ($urlParsed = parse_url($url)) || false === in_array($urlParsed['scheme'], ['http', 'https'])) return false;` (input: $url) – vectorialpx Jun 22 '22 at 08:04
35

Here is the best tutorial I found over there:

http://www.w3schools.com/php/filter_validate_url.asp

<?php
$url = "http://www.qbaki.com";

// Remove all illegal characters from a url
$url = filter_var($url, FILTER_SANITIZE_URL);

// Validate url
if (filter_var($url, FILTER_VALIDATE_URL) !== false) {
echo("$url is a valid URL");
} else {
echo("$url is not a valid URL");
}
?>

Possible flags:

FILTER_FLAG_SCHEME_REQUIRED - URL must be RFC compliant (like http://example)
FILTER_FLAG_HOST_REQUIRED - URL must include host name (like http://www.example.com)
FILTER_FLAG_PATH_REQUIRED - URL must have a path after the domain name (like www.example.com/example1/)
FILTER_FLAG_QUERY_REQUIRED - URL must have a query string (like "example.php?name=Peter&age=37")
Erich García
  • 1,648
  • 21
  • 30
  • 4
    Just a nit: `!filter_var(...) === false` ==> `filter_var(...) === true` or just `filter_var(...)`. :) – Domenico De Felice Jun 17 '18 at 07:48
  • @ErichGarcía this code doesn't check that it's a valid HTTP/S URL's like the OP asks. This will pass things like ssh://, ftp:// etc this only checks if its a syntactically valid URL according to RFC 2396 – twigg May 20 '19 at 19:31
  • Do not use FILTER_VALIDATE_URL. It is messy and unreliable. E.g. it validates `ttps://www.youtube.com` as valid – Jeffz May 17 '20 at 13:23
  • 1
    The very necessary filter flags were removed as of PHP 8 – Hobbamok Nov 10 '22 at 22:31
22

Using filter_var() will fail for urls with non-ascii chars, e.g. (http://pt.wikipedia.org/wiki/Guimarães). The following function encode all non-ascii chars (e.g. http://pt.wikipedia.org/wiki/Guimar%C3%A3es) before calling filter_var().

Hope this helps someone.

<?php

function validate_url($url) {
    $path = parse_url($url, PHP_URL_PATH);
    $encoded_path = array_map('urlencode', explode('/', $path));
    $url = str_replace($path, implode('/', $encoded_path), $url);

    return filter_var($url, FILTER_VALIDATE_URL) ? true : false;
}

// example
if(!validate_url("http://somedomain.com/some/path/file1.jpg")) {
    echo "NOT A URL";
}
else {
    echo "IS A URL";
}
Huey Ly
  • 474
  • 4
  • 6
  • This is it. Finally someone came back in 2017 – Kyle KIM Apr 24 '18 at 14:51
  • Works for me (the others do not BTW) :) – Jono Aug 18 '18 at 13:07
  • This is the ONLY solution that worked for me. Thanks! – Silas Dec 06 '19 at 17:47
  • This is not a check which will get 100% correct results! This will only handle non-ascii characters in the path, not in the domain path of the URL. Nowadays, you can also use other unicode chars in the domain - which will be converted to punycode (see https://en.wikipedia.org/wiki/Punycode), e.g. "https://www.guimarães.org". So if you regard the non-punycode converted URLs as valid - your check will fail on these. Even if you handle this in the check, there is still the question of e.g. "ttps://mydomain.org" being falsely interpreted as valid! (as pointed out in other answers) – Sybille Peters Jun 26 '21 at 06:40
  • Not necessary anymore (at least for my PHP 7.4 installation) – rabudde Mar 24 '22 at 12:17
  • I love the ternary for true false returns lol – Jordan Casey Nov 20 '22 at 05:40
11
function is_url($uri){
    if(preg_match( '/^(http|https):\\/\\/[a-z0-9_]+([\\-\\.]{1}[a-z_0-9]+)*\\.[_a-z]{2,5}'.'((:[0-9]{1,5})?\\/.*)?$/i' ,$uri)){
      return $uri;
    }
    else{
        return false;
    }
}
Milad Ghiravani
  • 1,625
  • 23
  • 43
6

Personally I would like to use regular expression here. Bellow code perfectly worked for me.

$baseUrl     = url('/'); // for my case https://www.xrepeater.com
$posted_url  = "home";
// Test with one by one
/*$posted_url  = "/home";
$posted_url  = "xrepeater.com";
$posted_url  = "www.xrepeater.com";
$posted_url  = "http://www.xrepeater.com";
$posted_url  = "https://www.xrepeater.com";
$posted_url  = "https://xrepeater.com/services";
$posted_url  = "xrepeater.dev/home/test";
$posted_url  = "home/test";*/

$regularExpression  = "((https?|ftp)\:\/\/)?"; // SCHEME Check
$regularExpression .= "([a-z0-9+!*(),;?&=\$_.-]+(\:[a-z0-9+!*(),;?&=\$_.-]+)?@)?"; // User and Pass Check
$regularExpression .= "([a-z0-9-.]*)\.([a-z]{2,3})"; // Host or IP Check
$regularExpression .= "(\:[0-9]{2,5})?"; // Port Check
$regularExpression .= "(\/([a-z0-9+\$_-]\.?)+)*\/?"; // Path Check
$regularExpression .= "(\?[a-z+&\$_.-][a-z0-9;:@&%=+\/\$_.-]*)?"; // GET Query String Check
$regularExpression .= "(#[a-z_.-][a-z0-9+\$_.-]*)?"; // Anchor Check

if(preg_match("/^$regularExpression$/i", $posted_url)) { 
    if(preg_match("@^http|https://@i",$posted_url)) {
        $final_url = preg_replace("@(http://)+@i",'http://',$posted_url);
        // return "*** - ***Match : ".$final_url;
    }
    else { 
          $final_url = 'http://'.$posted_url;
          // return "*** / ***Match : ".$final_url;
         }
    }
else {
     if (substr($posted_url, 0, 1) === '/') { 
         // return "*** / ***Not Match :".$final_url."<br>".$baseUrl.$posted_url;
         $final_url = $baseUrl.$posted_url;
     }
     else { 
         // return "*** - ***Not Match :".$posted_url."<br>".$baseUrl."/".$posted_url;
         $final_url = $baseUrl."/".$final_url; }
}
6

Actually... filter_var($url, FILTER_VALIDATE_URL); doesn't work very well. When you type in a real url, it works but, it only checks for http:// so if you type something like "http://weirtgcyaurbatc", it will still say it's real.

4

Given issues with filter_var() needing http://, I use:

$is_url = filter_var($filename, FILTER_VALIDATE_URL) || array_key_exists('scheme', parse_url($filename));

Autumn Leonard
  • 514
  • 8
  • 22
  • 2
    Do not use FILTER_VALIDATE_URL. It is messy and unreliable. E.g. it validates `ttps://www.youtube.com` as valid – Jeffz May 17 '20 at 13:24
  • 1
    @Jeffz FILTER_VALIDATE_URL does validate urls. A scheme is not limited to http or https only, these are all valid schemes ftp, mailto, file, data and irc. They are registered with IANA but also non registered schemes can be used. So as per URI definition ttps is a valid scheme – Marina Dunst May 02 '22 at 23:38
  • @MarinaDunst Yeah but `kkdjf://www.youtube.com` is valid too according to FILTER_VALIDATE_URL. It's definitely unreliable. – Paul Phillips Feb 11 '23 at 23:17
3

You can use this function, but its will return false if website offline.

  function isValidUrl($url) {
    $url = parse_url($url);
    if (!isset($url["host"])) return false;
    return !(gethostbyname($url["host"]) == $url["host"]);
}
Hasan Veli Soyalan
  • 2,428
  • 20
  • 24
2

Another way to check if given URL is valid is to try to access it, below function will fetch the headers from given URL, this will ensure that URL is valid AND web server is alive:

function is_url($url){
        $response = array();
        //Check if URL is empty
        if(!empty($url)) {
            $response = get_headers($url);
        }
        return (bool)in_array("HTTP/1.1 200 OK", $response, true);
/*Array
(
    [0] => HTTP/1.1 200 OK 
    [Date] => Sat, 29 May 2004 12:28:14 GMT
    [Server] => Apache/1.3.27 (Unix)  (Red-Hat/Linux)
    [Last-Modified] => Wed, 08 Jan 2003 23:11:55 GMT
    [ETag] => "3f80f-1b6-3e1cb03b"
    [Accept-Ranges] => bytes
    [Content-Length] => 438
    [Connection] => close
    [Content-Type] => text/html
)*/ 
    }   
Bud Damyanov
  • 30,171
  • 6
  • 44
  • 52
1

Came across this article from 2012. It takes into account variables that may or may not be just plain URLs.

The author of the article, David Müeller, provides this function that he says, "...could be worth wile [sic]," along with some examples of filter_var and its shortcomings.

/**
 * Modified version of `filter_var`.
 *
 * @param  mixed $url Could be a URL or possibly much more.
 * @return bool
 */
function validate_url( $url ) {
    $url = trim( $url );

    return (
        ( strpos( $url, 'http://' ) === 0 || strpos( $url, 'https://' ) === 0 ) &&
        filter_var(
            $url,
            FILTER_VALIDATE_URL,
            FILTER_FLAG_SCHEME_REQUIRED || FILTER_FLAG_HOST_REQUIRED
        ) !== false
    );
}
DaveyJake
  • 2,361
  • 1
  • 16
  • 19
  • Works better than simple filter_var, but also validates http://youtube, which basically is a valid url, but a local one (without tld) – NemoXP Dec 15 '20 at 08:48
  • 4
    FILTER_FLAG_ will now be removed in php 8.0, so this seems to be no loger an option. – Andreas Jun 16 '21 at 09:01
0
public function testing($Url=''){
    $ch = curl_init($Url);
    curl_setopt($ch, CURLOPT_TIMEOUT, 5);
    curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, 5);
    curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
    $data = curl_exec($ch);
    $httpcode = curl_getinfo($ch, CURLINFO_HTTP_CODE);
    curl_close($ch);
    return ($httpcode >= 200 && $httpcode < 300) ? true : false;
}
katulamu
  • 9
  • 3
0

There are three separated function I wrote in this case, I hope be useful:

/**
 * Check if the string is a relative or absolute URL
 * @param null|string $url The url string
 * @return bool
 */
function isUrl(string|null $url):bool{
    return (!empty($url)) && preg_match("/^(\w+\:[\/]*)?(\/?[^\/\{\}\|^\[\]\"`\r\n\t\f]){1,}$/",$url);
}
/**
 * Check if the string is only a relative URL
 * @param null|string $url The url string
 * @return bool
 */
function isRelativeUrl(string|null $url):bool{
    return (!empty($url)) && preg_match("/^(\/?[^\/\{\}\|\^\[\]\"\`\r\n\t\f]){1,}$/",$url);
}
/**
 * Check if the string is only an absolute URL
 * @param null|string $url The url string
 * @return bool
 */
function isAbsoluteUrl(string|null $url):bool{
    return (!empty($url)) && preg_match("/^\w+\:\/*(\/?[^\/\{\}\|^\[\]\"\`\r\n\t\f]){1,}$/",$url);
}

Enjoy...

MiMFa
  • 981
  • 11
  • 14
-2

if anyone is interested to use the cURL for validation. You can use the following code.

<?php 
public function validationUrl($Url){
        if ($Url == NULL){
            return $false;
        }
        $ch = curl_init($Url);
        curl_setopt($ch, CURLOPT_TIMEOUT, 5);
        curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, 5);
        curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
        $data = curl_exec($ch);
        $httpcode = curl_getinfo($ch, CURLINFO_HTTP_CODE);
        curl_close($ch);
        return ($httpcode >= 200 && $httpcode < 300) ? true : false; 
    }
VishalParkash
  • 490
  • 3
  • 15