47

How can we use PHP to identify URL's in a string and store them in an array?

Cannot use the explode function if the URL contains a comma, it wont give correct results.

Azraar Azward
  • 1,586
  • 2
  • 12
  • 16
  • See also https://stackoverflow.com/a/11588614/1066234 – Avatar May 23 '20 at 14:48
  • 2
    `preg_match_all("/\b((https?):\/\/)?([a-z0-9-.]*)\.([a-z]{2,3})([-A-Z0-9+&@#\/%?=~_|$!:,.;]*[A-Z0-9+&@#\/%=~_|$])/i", $string, $match);` use this one – Harsh Patel Nov 22 '21 at 07:22

6 Answers6

125

REGEX is the answer for your problem. Taking the Answer of Object Manipulator.. all it's missing is to exclude "commas", so you can try this code that excludes them and gives 3 separated URL's as output:

$string = "The text you want to filter goes here. http://google.com, https://www.youtube.com/watch?v=K_m7NEDMrV0,https://instagram.com/hellow/";

preg_match_all('#\bhttps?://[^,\s()<>]+(?:\([\w\d]+\)|([^,[:punct:]\s]|/))#', $string, $match);

echo "<pre>";
print_r($match[0]); 
echo "</pre>";

and the output is

Array
(
    [0] => http://google.com
    [1] => https://www.youtube.com/watch?v=K_m7NEDMrV0
    [2] => https://instagram.com/hellow/
)
aampudia
  • 1,581
  • 1
  • 11
  • 14
  • 6
    Maybe you'd want to make it case-insensitive by adding the `i` modifier. ie. `...#i'` – MrWhite Apr 12 '16 at 11:23
  • 1
    Just a note, some URLs use commas in their query strings – relipse Apr 12 '16 at 19:33
  • @aampudia: Very good approach. But is there a simple way to find urls without protocol, too? Like: "The text you want to filter goes here. www.google.de, www.youtube.com". – Marco Dec 12 '16 at 11:32
  • @Marco there is... but it depends on how are the urls going to be received! will the have the protocol but you dont want to capture it?? or the urls wont have the protocol?? – aampudia Dec 13 '16 at 00:30
  • @aampudia: I intend to have a textarea where a visitor can make an input and want to avoid them of making links. I want to remove all urls from this string, even if the visitor doesn't type the protocol. I know that "www" is just a subdomain, but in common this is the most important one. Also it's clear that the visitor just can type google.com and the function will not work, or if he types www(.)google(.)com ... – Marco Dec 15 '16 at 16:43
  • 1
    Note that url's don't always include `http` or `https`, since they can also begin with only `//`. – ashleedawg Dec 19 '18 at 18:05
  • this doesn't return trailing _ from URL. e.g. my url is `https://example.com/path/param_` but in `$match` it returns only `https://example.com/path/param` Any suggestions ? – shyammakwana.me Jul 29 '19 at 05:08
  • @shyammakwana.me you have to delete the [:punct:] part of the regular expression, that tells it to ingore all punctuation, if you remove that it will take the underscore at the end – aampudia Aug 06 '19 at 02:09
  • extract urls from webpage using PHP , follow this link please https://infoconic.com/blog/extract-urls-from-webpage-using-php/ – Infoconic Technologies Nov 15 '20 at 18:10
  • this regex breaks when a url has brackets and apostrophe ' in a url. Well a good url should be encoded, sometimes we have lots of data to process thats where this solution didnt work. – Rafee Jun 22 '21 at 05:00
9

please try to use below regex

$regex = '/https?\:\/\/[^\",]+/i';
preg_match_all($regex, $string, $matches);
echo "<pre>";
print_r($matches[0]); 

Hope this will work for you

JiteshNK
  • 428
  • 2
  • 11
5

You can try Regex here:

$string = "The text you want to filter goes here. http://google.com, https://www.youtube.com/watch?v=K_m7NEDMrV0,https://instagram.com/hellow/";

preg_match_all('#\bhttps?://[^\s()<>]+(?:\([\w\d]+\)|([^[:punct:]\s]|/))#', $string, $match);

echo "<pre>";
print_r($match[0]); 
echo "</pre>";

This gives the following output:

Array
(
  [0] => http://google.com
  [1] => https://www.youtube.com/watch?v=K_m7NEDMrV0,https://instagram.com/hellow/
)
Indrasis Datta
  • 8,692
  • 2
  • 14
  • 32
4

try this

function getUrls($string)
{
$regex = '/https?\:\/\/[^\" ]+/i';
preg_match_all($regex, $string, $matches);
return ($matches[0]);
}
$urls = getUrls($string);
print_r($urls);

or

$str = '<a href="http://foobar.com"> | Hello world Im a http://google.fr |     Did you mean:http://google.fr/index.php?id=1&b=6#2310';
$pattern = '`.*?((http|ftp)://[\w#$&+,\/:;=?@.-]+)[^\w#$&+,\/:;=?@.-]*?`i';
if (preg_match_all($pattern,$str,$matches)) 
{
print_r($matches[1]);
}

it will works

khan
  • 109
  • 2
  • 12
  • No, still its giving 2 results. there are 3 URL's but only 2 is returned. can u see? `Array ( [0] => http://google.com, [1] => https://www.youtube.com/watch?v=K_m7NEDMrV0,https://instagram.com/hellow/ )` – Azraar Azward Apr 12 '16 at 05:38
  • http://stackoverflow.com/questions/4390556/extract-url-from-string may be this wil help you – khan Apr 12 '16 at 05:42
  • can u provide a example with that regex? – Azraar Azward Apr 12 '16 at 05:48
  • no it doesnt work for my string. `$string = "The text you want to filter goes here. http://google.com, https://www.youtube.com/watch?v=K_m7NEDMrV0,https://instagram.com/hellow/";` – Azraar Azward Apr 12 '16 at 06:01
4
$urlstring = "The text you want to filter goes here. http://google.com, https://www.youtube.com/watch?v=K_m7NEDMrV0,https://instagram.com/hellow/";

preg_match_all('#\bhttps?://[^,\s()<>]+(?:\([\w\d]+\)|([^,[:punct:]\s]|/))#', $urlstring , $result);

print_r($result[0]); 
MrWhite
  • 43,179
  • 8
  • 60
  • 84
Prassd Nidode
  • 302
  • 1
  • 9
2
$string = "The text you want to filter goes here. http://google.com,
https://www.youtube.com/watch?v=K_m7NEDMrV0,https://instagram.com/hellow/";

preg_match_all('#\bhttps?://[^\s()<>]+(?:\([\w\d]+\)|([^[:punct:]\s]|/))#',
$string, $match);

echo "<pre>"; $arr = explode(",", $match[0][1]);
print_r($match[0][0]); print_r($arr); echo "</pre>";
Prassd Nidode
  • 302
  • 1
  • 9