How to remove HTML tag if it contains specific string

Question

    <tr>
        <td width="300" bgcolor="#cccccc" style="text-align: right;">
         <strong>&nbsp;&nbsp;&nbsp;Sometext<br />
         </strong>
        </td>
        <td width="125" bgcolor="#009900" style="text-align: center;">
         <strong><span style="color: rgb(255, 255, 255);">
          <span style="font-size: larger;">Pricetoreplace</span>
          </span>
         </strong>
        </td>
    </tr>

I need to remove whole <tr>....</tr> row, if it contain the "Pricetoreplace" text in it. I've tried next:

$content = preg_replace('~(<tr.*[\'"]Pricetoreplace[\'"].*tr>)~', '', $content);

But it didnt work.

What do you mean "it didn't work"? Was there an error? Did it not delete anything? — kchason, Nov 15 '17 at 14:20
You should *never* parse HTML with regex. Use [a PHP DOM parser](http://simplehtmldom.sourceforge.net/) instead. — Jay Blanchard, Nov 15 '17 at 14:21
@gtktuf first off, you're going to replace everything from the first instance to the last `tr>` so your regex is not going to do what you expect (you use greedy quantifiers `.*` instead of lazy quantifiers `.*?`). Second, your `.` doesn't match new line characters, you should use `[\s\S]` instead or turn on the `s` flag to match newline characters with the `.` character. Again, though, you shouldn't even be using regex for this. — ctwheels, Nov 15 '17 at 14:24
@gtktuf you really should be using something like [this](https://stackoverflow.com/questions/9478330/php-how-can-i-retrieve-a-div-tag-attribute-value) question does. — ctwheels, Nov 15 '17 at 14:26
@ctwheels. Thx. I understand that the use of regular expressions is not very suitable for this task, right?Based on your first reference — gtktuf, Nov 15 '17 at 14:27
@gtktuf yes. It's usually bad practice to parse HTML or XML with regex. Regex should only be used for parsing HTML or XML if it's a known subset. In your case it doesn't appear to be so. I would recommend you use an HTML/XML parser and have it do the heavy lifting for you. — ctwheels, Nov 15 '17 at 14:29
If you *really* do want a regex for this, you can use something like `Pricetoreplace<[\s\S]*?tr>`, but I would **highly** recommend you don't go this direction. — ctwheels, Nov 15 '17 at 14:33
@ctwheels Thx again. I'll try something like this: [link](https://stackoverflow.com/questions/3308530/php-strip-a-specific-tag-from-html-string) — gtktuf, Nov 15 '17 at 14:34
@gtktuf you're welcome! I hope you find a proper solution. If you do, you can answer your own question. I would also suggest changing the title to something like `How to remove HTML tag if it contains specific string`. This would allow future users that might be struggling with the same issue to easily find a solution (yours) and might help generate traffic to this question (and hopefully give you upvotes). See [this](https://stackoverflow.com/help/how-to-ask) and [this](https://meta.stackexchange.com/questions/10647/how-do-i-write-a-good-title) for more info about writing effective titles. — ctwheels, Nov 15 '17 at 14:43
@gtktuf [this](https://stackoverflow.com/questions/3577641/how-do-you-parse-and-process-html-xml-in-php) post might help you. The second answer provides a method to get `tr` elements as well as extract the content, which should get you partway there. — ctwheels, Nov 15 '17 at 14:54
Possible duplicate of [How do you parse and process HTML/XML in PHP?](https://stackoverflow.com/questions/3577641/how-do-you-parse-and-process-html-xml-in-php) — miken32, Nov 15 '17 at 21:07

score 4 · Accepted Answer · answered Nov 15 '17 at 16:14

One way would be to use an xpath query:

*//td[contains(., 'Pricetoreplace')]/parent::tr

Here, we look for a td which text() property contains Pricetoreplace and then look up the corresponding parent tr. The latter will be removed from the DOM.

In PHP:

<?php

$html = <<<DATA
    <tr><td class="some other class">some text here</td></tr>
   <tr>
        <td width="300" bgcolor="#cccccc" style="text-align: right;">
         <strong>&nbsp;&nbsp;&nbsp;Sometext<br />
         </strong>
        </td>
        <td width="125" bgcolor="#009900" style="text-align: center;">
         <strong><span style="color: rgb(255, 255, 255);">
          <span style="font-size: larger;">Pricetoreplace</span>
          </span>
         </strong>
        </td>
    </tr>
DATA;

# set up the DOM
$dom = new DOMDocument();
$dom->loadHTML($html, LIBXML_HTML_NODEFDTD | LIBXML_HTML_NOIMPLIED);

# set up the xpath
$xpath = new DOMXPath($dom);

foreach ($xpath->query("*//td[contains(., 'Pricetoreplace')]/parent::tr") as $row) {
    $row->parentNode->removeChild($row);
}
echo $dom->saveHTML();
?>

This yields

<tr><td class="some other class">some text here</td></tr>

That's the answer, but in my case i need to replace: `$dom->loadHTML($html, LIBXML_HTML_NODEFDTD | LIBXML_HTML_NOIMPLIED);` `$dom->loadHTML(mb_convert_encoding($content, 'HTML-ENTITIES', 'UTF-8'));` to solve some problems with encoding. And there's no classes like: `class="some other class"` in the whole posts, wich i need to rebuild with this php script-that was the main problem. Ty for this method. — gtktuf, Nov 16 '17 at 08:13

How to remove HTML tag if it contains specific string

1 Answers1