The problem I find using filter_var($url, FILTER_VALIDATE_URL)
is that it returns true when $url = "http://x";
No TLD is required. How I can solve this so a TLD is required?
The problem I find using filter_var($url, FILTER_VALIDATE_URL)
is that it returns true when $url = "http://x";
No TLD is required. How I can solve this so a TLD is required?
For TLD validation you need library that operates with Public Suffix List. Here are two diffent solutions for you.
First is TLDDatabase, technicaly it's only actual database of TLDs.
$store = new LayerShifter\TLDDatabase\Store();
$store->isICCAN('com'); // returns true
$store->isICCAN('co.uk'); // returns true
$store->isICCAN('example'); // returns false
If you need more intelligent solution, I recomend TLDExtract. It's domain parser that you can use as validator.
$extract = new LayerShifter\TLDExtract\Extract();
$extract->setExtractionMode(Extract::MODE_ALLOW_ICCAN);
# For domain 'shop.github.com'
$result = $extract->parse('shop.github.com');
$result->getRegistrableDomain(); // will return 'github.com'
$result->getSuffix(); // will return 'com'
# For domain 'shop.github.co.uk'
$result = $extract->parse('http://shop.github.co.uk');
$result->getRegistrableDomain(); // will return 'github.co.uk'
$result->getSuffix(); // will return 'co.uk'
# For domain 'example.example'
$result = $extract->parse('https://example.example');
$result->getRegistrableDomain(); // will return NULL
$result->getSuffix(); // will return NULL
# For domain 'localhost'
$result = $extract->parse('localhost');
$result->getRegistrableDomain(); // will return NULL
$result->getSuffix(); // will return NULL
Any URI starting with a scheme, like http://
, and containing valid URI characters after that is valid as per the official URI specification in RFC 3986:
Each URI begins with a scheme name, as defined in Section 3.1, that refers to a specification for assigning identifiers within that scheme. As such, the URI syntax is a federated and extensible naming system wherein each scheme's specification may further restrict the syntax and semantics of identifiers using that scheme.
What FILTER_VALIDATE_URL does is correct.
http://localhost
or http://x
are perfectly valid URIs.
If you really want to require and validate the TLD, then you have to use a white list, containing all the valid TLDs. Because each TLD differs on what counts as a subdomain, second level domain, etc. There are top level domains, second level domains, and subdomains. Technically speaking, everything except the TLD is a subdomain.
You find a maintained list of TLDs here:
For a PHP implementation (list parser):
From my perspective, this problem can't be solved by a "regexp" or "number of dots in hostname scan". One exception: if the usage scope of the validator is limited to only a few known urls, then you might solve this problem using these strategies.
Interesting is the MX record check suggested here: https://stackoverflow.com/a/14688913/1163786
Referencing