0

I need the function for extract link from html string. Example:

String:

<!-- BEGIN PARTNER PROGRAM - DO NOT CHANGE THE PARAMETERS OF THE HYPERLINK -
-> <a href='http://www.link.com' target='_blank'>text</a> <img 
src='http://www.linkimage.com' BORDER='0' WIDTH='1' HEIGHT='1' /> <!-- END 
PARTNER PROGRAM --> 

need to extract:

http://www.link.com

Thx

Pau
  • 21
  • 1
  • 5
  • https://www.mkyong.com/regular-expressions/how-to-extract-html-links-with-regular-expression/ – mic Nov 03 '17 at 09:18
  • 2
    Fails the obligatory "what have you tried and where are you stuck?" question... hint: don't use a [RegExp](https://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags/1732454#1732454) but [DOMDocument](http://php.net/manual/en/class.domdocument.php) – CD001 Nov 03 '17 at 09:19

2 Answers2

0
$string = "<!-- BEGIN PARTNER PROGRAM - DO NOT CHANGE THE PARAMETERS OF THE HYPERLINK -
-> <a href='http://www.link.com' target='_blank'>text</a> <img 
src='http://www.linkimage.com' BORDER='0' WIDTH='1' HEIGHT='1' /> <!-- END 
PARTNER PROGRAM --> ";

    $link = explode('<a href=\'', $string)[1];
    $link = explode('\'',$link)[0];
    echo $link;

    $linkimage = explode('src=\'', $string)[1];
    $linkimage = explode('\'',$linkimage)[0];
    echo $linkimage;
Teo Mihaila
  • 134
  • 1
  • 2
  • 18
0

the quick n dirty way:

preg_match_all('~href=([\'"])([^\'"]+)\\1~is', $htmlString, $matches); 

print_r($matches[2]);

the proper way:

http://php.net/manual/en/domdocument.getelementsbytagname.php / http://php.net/manual/en/simplexmlelement.xpath.php and so on..

Problem with proper way is that you need to tidy the html before parsing. In some cases even the php native http://php.net/manual/en/book.tidy.php fails to do that correctly.

acidofil
  • 61
  • 3