PHP - Strings - Remove a HTML tag with a specific class, including its contents

Question

I have a string like this:

<div class="container">
  <h3 class="hdr"> Text </h3>
  <div class="main">
    text
    <h3> text... </h3>
    ....

  </div>
</div>

how do I remove the H3 tag with the .hdr class using as little code as possible ?

The author is right. He wants to send output without that html element. Javascript modifies it, but it must not be present. — Alex, Jun 30 '10 at 13:31
well I need to modify the generated output of a Wordpress function. js is not a good way to do that... — Alex, Jun 30 '10 at 13:32
possible duplicate of [php regexp: remove all atributes from an html tag](http://stackoverflow.com/questions/3026096/php-regexp-remove-all-atributes-from-an-html-tag) - can easily be adapted to remove the entire node instead of just the attribute. See my DOM solution. — Gordon, Jun 30 '10 at 13:34
The Op ask for a server-side solution. We know that Jquery is always the answer... http://meta.stackexchange.com/questions/45176/when-is-use-jquery-not-a-valid-answer-to-a-javascript-question — Clement Herreman, Jun 30 '10 at 13:34
javascript may be disabled on the users pc. why should that user see that element when it's not intended to be there? — Alex, Jun 30 '10 at 13:37

score 23 · Accepted Answer · answered Jun 30 '10 at 13:37

23

Using as little code as possible? Shortest code isn't necessarily best. However, if your HTML h3 tag always looks like that, this should suffice:

$html = preg_replace('#<h3 class="hdr">(.*?)</h3>#', '', $html);

Generally speaking, using regex for parsing HTML isn't a particularly good idea though.

answered Jun 30 '10 at 13:37

Daniel Egeberg

8,359
31
44

thanks, that works! but why do you sau using regex is not a good idea? is it because it takes more cpu? the string I'm talking about can be quite large. it's a output buffer from a function that should write something on the screen – Alex Jun 30 '10 at 13:40
2

@Alex because HTML is not regular. Use DOM if you want to work with HTML. There is an example in the linked duplicate. – Gordon Jun 30 '10 at 13:42
in this case, everything besides what's inside div.main is regular :) tx – Alex Jun 30 '10 at 13:46
maybe you should considere "/
(.*?)<\/h3>/i" the "i" just to ignore case
– ddjikic Oct 08 '14 at 23:18
3

Important: Only works if opening and closing tag are on the same line! Linebreaks inbetween are regarded by using: `$html = preg_replace('#
(.*?)
#si', '', $html);` s = include linebreak characters, i = case insensitive – Avatar Jul 13 '17 at 12:25

score 3 · Answer 2 · answered Jun 30 '10 at 13:38

3

Something like this is what you're looking for...

$output = preg_replace("#<h3 class=\"hdr\">(.*?)</h3>#is", "", $input);

Use "is" at the end of the regex because it will cause it to be case insensitive which is more flexible.

answered Jun 30 '10 at 13:38

Ben

60,438
111
314
488

1

and you forgot \ it should be <\/h3> – ddjikic Oct 08 '14 at 23:26

score 1 · Answer 3 · answered Jan 16 '20 at 09:09

Stumbled upon this via Google - for anyone else feeling dirty using regex to parse HTML, here's a DOMDocument solution I feel much safer with going:

function removeTagByClass(string $html, string $className) {
    $dom = new \DOMDocument();
    $dom->loadHTML($html);
    $finder = new \DOMXPath($dom);

    $nodes = $finder->query("//*[contains(concat(' ', normalize-space(@class), ' '), ' {$className} ')]");

    foreach ($nodes as $node) {
        $node->parentNode->removeChild($node);
    }

    return $dom->saveHTML();
}

Thanks to this other answer for the XPath query.

score 0 · Answer 4 · answered Jun 30 '10 at 13:42

try a preg_match, then a preg_replace on the following pattern:

/(<h3
[\s]+
[^>]*?
class=[\"\'][^\"\']*?hdr[^\"\']*?[\"\']
[^>]*?>
[\s\S\d\D\w\W]*?
<\/h3>)/i

It's messy, and it should work fine only if the h3 tag doesn't have inline javascript which might contain sequences that this regular expression will react to. It is far from perfect, but in simple cases where h3 tag is used it should work.

Haven't tried it though, might need adjustments.

Another way would be to copy that function, use your copy, without the h3, if it's possible.

score 0 · Answer 5 · answered Oct 08 '18 at 06:16

This would help someone if above solutions dont work. It remove iframe and content having tag '-webkit-overflow-scrolling: touch;' like i had :)

RegEx, or regular expressions is code for what you would like to remove, and PHP function preg_replace() will remove all div or divs matching, or replacing them with something else. In the examples below, $incoming_data is where you put all your content before removing elements, and $result is the final product. Basically we are telling the code to find all divs with class=”myclass” and replace them with ” ” (nothing).

How to remove a div and its contents by class in PHP Just change “myclass” to whatever class your div has.

 $result = preg_replace('#<div class="myclass">(.*?)</div>#', ' ',
 $incoming_data);

How to remove a div and its contents by ID in PHP Just change “myid” to whatever ID your div has.

$result = preg_replace('#(.*?)#', ' ', $incoming_data);

If your div has multiple classes? Just change “myid” to whatever ID your div has like this.

$result = preg_replace('#<div id="myid(.*?)</div>#', ' ', $incoming_data);
or if div don’t have an ID, filter on the first class of the div like this.
$result = preg_replace('#<div class="myclass(.*?)</div>#', ' ', $incoming_data);

How to remove all headings in PHP This is how to remove all headings.

$result = preg_replace('#<h1>(.*?)</h1>#', ' ', $incoming_data);
and if the heading have a class, do something like this:
$result = preg_replace('#<h1 class="myclass">(.*?)</h1>#', ' ', $incoming_data);

Source: http://www.lets-develop.com/html5-html-css-css3-php-wordpress-jquery-javascript-photoshop-illustrator-flash-tutorial/php-programming/remove-div-by-class-php-remove-div-contents/

score -1 · Answer 6 · answered Jan 08 '15 at 11:01

$content = preg_replace('~(.*?)~', '', $content);

Above code only works if the div haves are both on the same line. what if they aren't?

$content = preg_replace('~[^|]*?~', '', $content);

This works even if there is a line break in between but fails if the not so used | symbol is in between anyone know a better way?

PHP - Strings - Remove a HTML tag with a specific class, including its contents

6 Answers6

(.*?)<\/h3>/i" the "i" just to ignore case

(.*?)