0

I am in problem to remove specific url from text but keep the text or html tags between the anchor tag. But I cannot remove it. I remove the specific url from the text but, cannot get the text or html between the anchor tag. Here is my code to remove specific url from the text.

preg_replace(|<a [^>]*href="http://www.microsoft.com[^"]*"[^>]*>.*</a>|iU, '', $a)

and Here is the sample

<a href="http://www.microsoft.com/">   <img src="http://c.s-microsoft.com/en-in/CMSImages/MMD_TCFamily_1006_540x304.jpg?version=ac2c5995-fde2-b40b-3f2a-b6a0baa88250" class="mscom-image feature-image" alt="Learn about Lumia 950 and Lumia 950 XL." width="540" height="304">   </a>

I want to get the img tag or any text between that anchor tag having the specific url.

Did I make any mistake in my code. Please correct me. I want this in regex in php Please help me.

Maninderpreet Singh
  • 2,569
  • 2
  • 17
  • 31
Shawon
  • 302
  • 3
  • 15

1 Answers1

1

Here we go again... Don't use regexes to parse html, use an html parser, DOMDocument for example:

$html = <<< EOF
<a href="http://www.microsoft.com/">   <img src="http://c.s-microsoft.com/en-in/CMSImages/MMD_TCFamily_1006_540x304.jpg?version=ac2c5995-fde2-b40b-3f2a-b6a0baa88250" class="mscom-image feature-image" alt="Learn about Lumia 950 and Lumia 950 XL." width="540" height="304">  SOME TEXT </a>
EOF;

$dom = new DOMDocument();
$dom->loadHTML($html);
$xpath = new DOMXPath($dom);
foreach($xpath->query("//a[contains(@href,'microsoft.com')]") as $element ){
        $img = $xpath->query('./img',$element)->item(0);
        echo $img->getAttribute('src'); // img source
        echo $img->getAttribute('alt'); // img alt text
        echo $element->textContent; //text inside the a tag
}
//http://c.s-microsoft.com/en-in/CMSImages/MMD_TCFamily_1006_540x304.jpg?version=ac2c5995-fde2-b40b-3f2a-b6a0baa88250
//Learn about Lumia 950 and Lumia 950 XL.
//SOME TEXT

Ideone Demo

Community
  • 1
  • 1
Pedro Lobito
  • 94,083
  • 31
  • 258
  • 268