4

According to the post here, the code below can remove the HTML tag, such as <div>. But I found that the end tag </div> still remain in the string.

$content = "<div id=\"header\">this is something with an <img src=\"test.png\"/> in it.</div>";
$content = preg_replace("/<div[^>]+\>/i", "", $content); 
echo $content;

I have tried something below, but still not work, how can I fix this issue?

$content = preg_replace("/<\/div[^>]+\>/i", "", $content); 
$content = preg_replace("/<(/)div[^>]+\>/i", "", $content); 

Thanks

Community
  • 1
  • 1
Charles Yeung
  • 38,347
  • 30
  • 90
  • 130

4 Answers4

9

The end tag doesn't have anything between the div and the >, so instead try something like:

$content = preg_replace("/<\/?div[^>]*\>/i", "", $content); 

This will remove patterns of the form:

<div>
</div>
<div class=...>
Rowland Shaw
  • 37,700
  • 14
  • 97
  • 166
3

change it to "/<[\/]*div[^>]*>/i"

Desolator
  • 22,411
  • 20
  • 73
  • 96
2

If you can guarantee the HTML being passed in will be valid and structured in a certain way you should be OK with regex.

In general, though, it's best to avoid using regex for working with HTML, because the markup can be so varied and messy. Instead, try using a library like DOMDocument - it handles all the messiness for you.

With DOMDocument you would do something like:

$doc = new DOMDocument;
$doc->loadHTML($html);
$headerElement = $doc->getElementById('header');
$headerElement->parentNode->removeChild($headerElement);
$amendedHtml = $doc->saveHTML(); 
Dan Blows
  • 20,846
  • 10
  • 65
  • 96
1
$content = preg_replace("/<\/?(div|b|span)[^>]*\>/i", "", $content); 

remove all

<div...>
</div>
<b....>
</b>
<span...>
</span>