12

I need help creating a Regex that will match all urls for example, please do not close question as a duplicate as I have been looking for what i need for a long time, and none of the answers i have seen have given an answer that solves my problem.

website.com

www.website.com

http://www.website.com

http://website.com

https://www.website.com

https://website.com

with also anything trailing

www.website.com/path-to-something

I am coding something that shortens any url, but to do so, first i need to match them all.

Thanks

Spudley
  • 166,037
  • 39
  • 233
  • 307
André Figueira
  • 6,048
  • 14
  • 48
  • 62
  • 2
    What's your effort so far ? – Rikesh May 10 '13 at 11:53
  • I've tried a whole load of different expressions, Regex isns't exactly my forte... (http://[^ ]+) that is all i have right now, but it only matches one kind. – André Figueira May 10 '13 at 11:53
  • 1
    @Spudley: Don't think so since `website.com` is not a valid URL and the usecase is different. OP wants to search for matching base URLs. – Menno May 10 '13 at 12:06
  • @Aquillo - some of the answers on that other question would work fine. But really, the only difference between a valid URL and `website.com` is making the protocol part optional in the regex. – Spudley May 10 '13 at 12:11
  • 1
    Use following RegEx, it's more generic: preg_match_all(@((((ht)|(f))tp[s]?://)|(www\.))([a-z][-a-z0-9]+\.)?([a-z][-a-z0-9]+\.)?[a-z][-a-z0-9]+\.[a-z]+[/]?[a-z0-9._\/~#&=;%+?-]*@si', $input, $result); – Mohammad Anini Oct 09 '13 at 19:18

4 Answers4

21

This one match correctly all you posted:

preg_match_all('#[-a-zA-Z0-9@:%_\+.~\#?&//=]{2,256}\.[a-z]{2,4}\b(\/[-a-zA-Z0-9@:%_\+.~\#?&//=]*)?#si', $targetString, $result);
Lakatos Gyula
  • 3,949
  • 7
  • 35
  • 56
5

You want to use something like this:

$string = 'www.blah.com';

$temp_string = (!preg_match('#^(ht|f)tps?://#', $string)) // check if protocol not present
    ? 'http://' . $string // temporarily add one
    : $string; // use current

if (filter_var($temp_string, FILTER_VALIDATE_URL))
{
    echo 'is valid';
} else {
    echo 'not valid';
}

This uses PHP's build in URL validation. It will first check to see if a protocol is present, if it is not it will temporarily add one to a string to be checked then run it through validation. This is accurate unlike the currently accepted answer.

kittycat
  • 14,983
  • 9
  • 55
  • 80
  • Does this match things like gooogle.com and www.google.com? – André Figueira May 12 '13 at 10:59
  • yes http://viper-7.com/Jz7nR1 and yes http://viper-7.com/Iv9SiS – kittycat May 12 '13 at 11:03
  • What the above code does is if it finds a URL that is invalid in the sense it does not begin with either http:// https:// ftp:// or ftps:// it will temporarily add http:// to make it a full URL which can then be safely passed to the PHP built in URL validation function. Otherwise if it already contains it it will just pass to the validation function as-is. – kittycat May 12 '13 at 11:04
  • "http://googlecom" is valid according to FILTER_VALIDATE_URL. Not sure what your use is, but this isn't what I'd accept as a valid URL – Graham T May 17 '14 at 14:14
2

You can use the following trick :

$url = "your URL"
$validation = "/^(http|https|ftp):\/\/([A-Z0-9][A-Z0-9_-]*(?:\.[A-Z0-9][A-Z0-9_-]*)+):?(\d+)?\/?/i";
if((bool)preg_match($validation, $url) === false)
echo 'Not a valid URL';

I think it may works for you.

Vishal Purohit
  • 261
  • 2
  • 12
-1

Don't use a regex. There's a PHP function for doing what you want.

http://php.net/manual/en/function.parse-url.php

Danack
  • 24,939
  • 16
  • 90
  • 122
  • The questions is asking how to break down URL like things so that equivalent links can be checked for going to equivalent places, not whether it's valid or not. He wants to check whether the host is the same, the path is the same etc. – Danack May 11 '13 at 00:51
  • actually, we do shorten invalid urls, we match things that may be urls like if someone was to use google.com. technically that would be invalid, but we just look for that fix it, then shorten it.... Parse url isn't what we need, we already have a solution using regex working exactly like we want it, thanks for posting your answer anyway. – André Figueira May 12 '13 at 10:57
  • 1
    @Danack parse_url() : _This function is not meant to validate the given URL, it only breaks it up into the above listed parts._ – Fredmat May 21 '15 at 17:07