How to strip a tag and all of its inner html using the tag's id?

Question

I have the following html:

<html>
 <body>
 bla bla bla bla
  <div id="myDiv"> 
         more text
      <div id="anotherDiv">
           And even more text
      </div>
  </div>

  bla bla bla
 </body>
</html>

I want to remove everything starting from <div id="anotherDiv"> until its closing <div>. How do I do that?

There seems to be an edit war on this page. Please clarify this Unclear question so that researchers can benefit. — mickmackusa, Nov 22 '19 at 22:00
There is a big difference between removing a single, specific element versus removing all tags with a specific tagname. — mickmackusa, Nov 22 '19 at 22:15
Every regex solution to this question is incorrect, for any interpretation of this question, and will fail in surprising ways on many different inputs. You need a DOM parser, as the accepted answer uses. Whether you thought the question wanted to strip a `
`, or strip an element by its ID, neither option can be accomplished correctly with a regular expression. — user229044, Nov 28 '19 at 03:12
Consider stripping `
` (by tag or by ID) from `
` with a regex. Or `
`, or any other number of simple cases that will break a regex-based solution. — user229044, Nov 28 '19 at 03:35

score 34 · Accepted Answer · answered Jul 22 '10 at 12:10

34

With native DOM

$dom = new DOMDocument;
$dom->loadHTML($htmlString);
$xPath = new DOMXPath($dom);
$nodes = $xPath->query('//*[@id="anotherDiv"]');
if($nodes->item(0)) {
    $nodes->item(0)->parentNode->removeChild($nodes->item(0));
}
echo $dom->saveHTML();

answered Jul 22 '10 at 12:10

Gordon

312,688
75
539
559

what i have to modify if i want to remove all div tag in a dom? – Sisir Nov 19 '11 at 08:51
@Sisir see http://stackoverflow.com/questions/4177376/delete-all-elements-of-a-certain-type-from-an-xml-doc-using-php/4177407#4177407 – Gordon Nov 19 '11 at 09:10
1

yes this works a treat. Ive always wante dto be able to remove an html tag form a string of html much like a jquery $(selector#id).remove(). This is just brilliant! – azzy81 Mar 09 '12 at 07:50
@SubstanceD if you want selectors check out [phpQuery, Zend_Dom or QueryPath](http://stackoverflow.com/questions/3577641/how-to-parse-and-process-html-with-php/3577662#3577662). Personally, I prefer [XPath](http://schlitt.info/opensource/blog/0704_xpath.html). – Gordon Mar 09 '12 at 08:43

score 14 · Answer 2 · edited Oct 05 '12 at 14:30

14

You can use preg_replace() like:

$string = preg_replace('/<div id="someid"[^>]+\>/i', "", $string);

edited Oct 05 '12 at 14:30

Florent

12,310
10
49
58

answered Jul 22 '10 at 12:00

Haim Evgi

123,187
45
217
223

1

this will remove all `div`s and not only the specified one. – jigfox Jul 22 '10 at 12:11
You don't specify anywhere that it must remove the div with the ID=myDiv? – rockstardev Jul 22 '10 at 12:11
@HaimEvgi Is there any way to remove the content inner? for example using p tags it'll be removed, but the content of the p tags remains. – avolquez Nov 29 '12 at 19:29
this rocks, but is there anyway to remove the closing tag? – hakazvaka Apr 22 '13 at 15:10
Here is a simple way to strip specific tags(both open & closing): https://gist.github.com/tedicela/0b06265eefb8df41cb8256bb3f442916 – Tedi Çela Dec 09 '16 at 14:44
1

This answer DEFINITELY doesn't do what the OP requires. 16 UVs means that lots of researchers have been misinformed and don't understand the question and/or what this answer does. This answer does far more harm than good. The overarching message should be that developers should use a dom parser to manipulate valid html. – mickmackusa Nov 21 '19 at 21:43
1

Question says: _I want to remove everything starting from
until its closing
. How do I do that?_ **This answer is incorrect.**
– mickmackusa Nov 21 '19 at 21:46
This is incorrect and fails for `
`. You cannot use a regex for this.
– user229044 Nov 28 '19 at 03:45

RafaSashi · Answer 3 · 2019-11-28T16:54:04.920

Using the native XML Manipulation Library

Assuming that your html content is stored in the variable $html:

$html='<html>
 <body>
 bla bla bla bla
  <div id="myDiv"> 
         more text
      <div id="anotherDiv">
           And even more text
      </div>
  </div>

  bla bla bla
 </body>
</html>';

To delete the tag by ID use the following code:

    $dom=new DOMDocument;

    $dom->validateOnParse = false;

    $dom->loadHTML( $html );

    // get the tag

    $div = $dom->getElementById('anotherDiv');

   // delete the tag

    if( $div && $div->nodeType==XML_ELEMENT_NODE ){

        $div->parentNode->removeChild( $div );
    }

    echo $dom->saveHTML();

Note that certain versions of libxml require a doctype to be present in order to use the getElementById method.

In that case you can prepend $html with <!doctype>

$html = '<!doctype>' . $html;

Alternatively, as suggested by Gordon's answer, you can use DOMXPath to find the element using the xpath:

$dom=new DOMDocument;

$dom->validateOnParse = false;

$dom->loadHTML( $html );

$xp=new DOMXPath( $dom );

$col = $xp->query( '//div[ @id="anotherDiv" ]' );

if( !empty( $col ) ){

    foreach( $col as $node ){

        $node->parentNode->removeChild( $node );

    }

}

echo $dom->saveHTML();

The first method works regardless the tag. If you want to use the second method with the same id but a different tag, let say form, simply replace //div in //div[ @id="anotherDiv" ] by '//form'

score 0 · Answer 4 · answered Jul 22 '10 at 12:01

0

strip_tags() function is what you are looking for.

http://us.php.net/manual/en/function.strip-tags.php

answered Jul 22 '10 at 12:01

ItsPronounced

5,475
13
47
86

4

trip_tags() doesn’t work the way he want it to. strip_tags() allows for certain exclusions, but why would you use that when you only want to exclude one tag and include all other tags – Haim Evgi Jul 22 '10 at 12:02
From his question, I couldn't really tell what tags he was trying to remove. It seemed as if he wanted to remove everything. Thanks for the input. – ItsPronounced Jul 22 '10 at 12:03
Ahhh, using chrome. His inline markup didn't show up. I just checked it in firefox and I see his inline markup. You are correct :) Any reason why it didn't show up in chrome? – ItsPronounced Jul 22 '10 at 12:06
strip_tags() worked best for me. Thanks. The reason it worked best for me is because i had tags that had no spaces. It was the easiest by far. thanks. – Alex Spencer Dec 19 '12 at 02:24
Question says: _I want to remove everything starting from
until its closing
. How do I do that?_ **This answer is incorrect.**
– mickmackusa Nov 21 '19 at 21:48

score -1 · Answer 5 · answered May 02 '16 at 11:42

-1

I wrote these to strip specific tags and attributes. Since they're regex they're not 100% guaranteed to work in all cases, but it was a fair tradeoff for me:

// Strips only the given tags in the given HTML string.
function strip_tags_blacklist($html, $tags) {
    foreach ($tags as $tag) {
        $regex = '#<\s*' . $tag . '[^>]*>.*?<\s*/\s*'. $tag . '>#msi';
        $html = preg_replace($regex, '', $html);
    }
    return $html;
}

// Strips the given attributes found in the given HTML string.
function strip_attributes($html, $atts) {
    foreach ($atts as $att) {
        $regex = '#\b' . $att . '\b(\s*=\s*[\'"][^\'"]*[\'"])?(?=[^<]*>)#msi';
        $html = preg_replace($regex, '', $html);
    }
    return $html;
}

answered May 02 '16 at 11:42

Aram Kocharyan

20,165
11
81
96

1

Regex is DOM-ignorant and is prone to failure. Using a legitimate DOM parsing technique will be more robust, reliable, and scalable. Iterated `preg_` calls is going to be inefficient. The `m` pattern modifier is of no use. – mickmackusa Nov 21 '19 at 21:50
1

This answer does not target the tag using the `id` as stated in the question. This answer is incorrect because it with remove elements that should not be removed. – mickmackusa Nov 21 '19 at 22:06

score -1 · Answer 6 · edited Jun 22 '17 at 06:32

-1

how about this?

// Strips only the given tags in the given HTML string.
function strip_tags_blacklist($html, $tags) {
    $html = preg_replace('/<'. $tags .'\b[^>]*>(.*?)<\/'. $tags .'>/is', "", $html);
    return $html;
}

edited Jun 22 '17 at 06:32

Community

1
1

answered Jun 02 '17 at 06:03

Hoàng Vũ Tgtt

1,863
24
8

1

Regex is DOM-ignorant and is prone to failure. Using a legitimate DOM parsing technique will be more robust, reliable, and scalable. There is no reason to declare `$html` (a single-use variable); just `return preg_replace(...);` This snippet will fail when a tag attribute value contains `>`. There is no need to use a capture group. – mickmackusa Nov 21 '19 at 21:53
This answer does not target the tag using the `id` as stated in the question. This answer is incorrect because it with remove elements that should not be removed. – mickmackusa Nov 21 '19 at 22:07
This is incorrect and fails for many kinds of input, for example `strip_tags_blacklist('
foo
', 'div')` => ` – user229044 Nov 28 '19 at 03:49

score -1 · Answer 7 · answered Apr 24 '19 at 09:34

-1

Following RafaSashi's answer using preg_replace(), here's a version that works for a single tag or an array of tags:

/**
 * @param $str string
 * @param $tags string | array
 * @return string
 */

function strip_specific_tags ($str, $tags) {
  if (!is_array($tags)) { $tags = array($tags); }

  foreach ($tags as $tag) {
    $_str = preg_replace('/<\/' . $tag . '>/i', '', $str);
    if ($_str != $str) {
      $str = preg_replace('/<' . $tag . '[^>]*>/i', '', $_str);
    }
  }
  return $str;
}

answered Apr 24 '19 at 09:34

Jonathan Land

9
2

1

Question says: _I want to remove everything starting from
until its closing
. How do I do that?_ **This answer is incorrect.**
– mickmackusa Nov 21 '19 at 21:55
1

This answer does not target the tag using the `id` as stated in the question. This answer is incorrect because it with remove elements that should not be removed. – mickmackusa Nov 21 '19 at 22:07

How to strip a tag and all of its inner html using the tag's id?

7 Answers7

Linked

Related