PHP preg_replace()

Question

I am trying to remove following pattern from a string:

<div class="main_title">Content 1</div>

where 'Content 1' may vary between strings.

The following does not seem to be working:

$output = preg_replace('<div class="main_title">.*</div>', " ", $output);

Am I missing something obvious?

`Am I missing something obvious?` You're trying to parse HTML with regular expressions. — , May 28 '13 at 21:38
Do not parse HTML with a regular expression! http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags — Micha Wiedenmann, May 28 '13 at 21:38
See [these](http://stackoverflow.com/questions/3577641/how-to-parse-and-process-html-xml/3577662#3577662) [answers](http://stackoverflow.com/questions/3820666/grabbing-the-href-attribute-of-an-a-element/3820783#3820783) for a [better way](http://stackoverflow.com/questions/4979836/noob-question-about-domdocument-in-php/4983721#4983721). — George Cummins, May 28 '13 at 21:41

score 3 · Answer 1 · answered May 28 '13 at 21:50

The DOM method is probably superior because you don't have to worry about case sensitive, whitespace, etc.

$dom = new DOMDocument;
$dom->loadHTML($html);
$xpath = new DOMXPath($dom);
foreach ($xpath->query('//div[@class="main_title"]') as $node) {
    $node->parentNode->removeChild($node);
}
$output = $dom->saveHTML();

It's possible to do with regex, especially if you can trust that your input will follow a very specific format (no extra whitespace, perhaps no case discrepancies, etc.) Your main issue is a lack of PCRE delimiters.

$output = preg_replace('@<div class="main_title">.*?</div>@', '', $output);

score 1 · Accepted Answer · answered May 28 '13 at 21:49

As others says in the comments, don't use regular expressions to parse HTML, use SimpleXML or DOMDocument instead. If you need a regex yet, you need to put the pattern delimiters in your code:

$output = preg_replace('#<div class="main_title">.*</div>#', " ", $output);

PHP preg_replace()

2 Answers2