0

I have a string containing a HTML document and I want to extract all URL's from it. I tried this:

preg_match_all('/(http:\/\/){1}.{1,}\..{1,}/', $html_document /* a valid document, containing a lot of links*/, $matches);
print_r($matches);

But instead of array containing all links, I get parts of HTML code.
What's wrong with my code?

  • 1
    `{1,}` allows for "one to infinite" matches. if your text has two or more urls, you're allowing a match of **ALL** the text between those two urls. or even two `/` will do it: `foo http://example.com/ this is some filler text with a . and /"` will capture the "this is some filler text" – Marc B Aug 12 '14 at 16:45
  • 1
    See [What is the best regular expression to check if a string is a valid URL?](http://stackoverflow.com/questions/161738/what-is-the-best-regular-expression-to-check-if-a-string-is-a-valid-url) – Braj Aug 12 '14 at 16:45
  • possible duplicate of [Extract URLs from text in PHP](http://stackoverflow.com/questions/910912/extract-urls-from-text-in-php) – hlscalon Aug 12 '14 at 16:46
  • Do you want to **validate** or just want to **extract** it? – Braj Aug 12 '14 at 16:47

1 Answers1

1

If you are interested in extracting the url instead of validating it then try below regex:

\bhttps?:\/\/[^\s]*

Here is online demo

sample code:

$re = "/\\bhttps?:\\/\\/[^\\s]*/im";
$str = "http://www.regex101.com https://www.stachoverflow.com";

preg_match_all($re, $str, $matches);
Braj
  • 46,415
  • 5
  • 60
  • 76