3

The problem I find using filter_var($url, FILTER_VALIDATE_URL) is that it returns true when $url = "http://x";

No TLD is required. How I can solve this so a TLD is required?

itsliamoco
  • 1,028
  • 1
  • 12
  • 28
  • [We actually had another question about this today](http://stackoverflow.com/q/26489341/1188035). Check out all the links in there for additional strategies to fully validate a URL. [Something like this library could help you](https://github.com/franksrevenge/StrictUrlValidator) – sjagr Oct 21 '14 at 20:40
  • urls not necessarily need TLD for beeing valid. ie. http://localhost/myproject/index.php – boulder_02 Oct 21 '14 at 21:42
  • you can use regular expression. See here: http://www.regexr.com/38vdq – boulder_02 Oct 21 '14 at 21:46

2 Answers2

1

For TLD validation you need library that operates with Public Suffix List. Here are two diffent solutions for you.

First is TLDDatabase, technicaly it's only actual database of TLDs.

$store = new LayerShifter\TLDDatabase\Store();

$store->isICCAN('com'); // returns true
$store->isICCAN('co.uk'); // returns true
$store->isICCAN('example'); // returns false

If you need more intelligent solution, I recomend TLDExtract. It's domain parser that you can use as validator.

$extract = new LayerShifter\TLDExtract\Extract();
$extract->setExtractionMode(Extract::MODE_ALLOW_ICCAN);

# For domain 'shop.github.com'

$result = $extract->parse('shop.github.com');
$result->getRegistrableDomain(); // will return 'github.com'
$result->getSuffix(); // will return 'com'    

# For domain 'shop.github.co.uk'

$result = $extract->parse('http://shop.github.co.uk');
$result->getRegistrableDomain(); // will return 'github.co.uk'
$result->getSuffix(); // will return 'co.uk'    

# For domain 'example.example'

$result = $extract->parse('https://example.example');
$result->getRegistrableDomain(); // will return NULL
$result->getSuffix(); // will return NULL

# For domain 'localhost'

$result = $extract->parse('localhost');
$result->getRegistrableDomain(); // will return NULL
$result->getSuffix(); // will return NULL
Oleksandr Fediashov
  • 4,315
  • 1
  • 24
  • 42
0

Any URI starting with a scheme, like http://, and containing valid URI characters after that is valid as per the official URI specification in RFC 3986:

Each URI begins with a scheme name, as defined in Section 3.1, that refers to a specification for assigning identifiers within that scheme. As such, the URI syntax is a federated and extensible naming system wherein each scheme's specification may further restrict the syntax and semantics of identifiers using that scheme.

What FILTER_VALIDATE_URL does is correct.

http://localhost or http://x are perfectly valid URIs.

If you really want to require and validate the TLD, then you have to use a white list, containing all the valid TLDs. Because each TLD differs on what counts as a subdomain, second level domain, etc. There are top level domains, second level domains, and subdomains. Technically speaking, everything except the TLD is a subdomain.

You find a maintained list of TLDs here:

For a PHP implementation (list parser):

From my perspective, this problem can't be solved by a "regexp" or "number of dots in hostname scan". One exception: if the usage scope of the validator is limited to only a few known urls, then you might solve this problem using these strategies.

Interesting is the MX record check suggested here: https://stackoverflow.com/a/14688913/1163786

Referencing

Community
  • 1
  • 1
Jens A. Koch
  • 39,862
  • 13
  • 113
  • 141