I agree with MarcB. Overall, it's better to use a DOM when manipulating HTML. But here is a regex based on smottt's answer that might work:
$html = preg_replace('~<div([^>]*)(class\\s*=\\s*["\']removeMe["\'])([^>]*)>(.*?)</div>~i', '', $html);
- Use
[^>]*
and [^<]*
instead of .*
. In my testing, .*?
doesn't work. If a non-matching div comes before a matching div, it will match the first div, everything in between, and the last div. For example, it incorrectly matches against this entire string: <div></div><b>hello</b><div class="removeMe">bar</div>
- Take into account the fact that you can use single quotes with HTML attributes.
- Also remember that there can be whitespace around the equals sign.
You should use the "m" modifier too so that it takes line breaks into account (see this page).
I added parenthesis for clarity, but they aren't needed. Let me know if this works or not.
EDIT: Actually, nevermind, the "m" modifier won't do anything.
EDIT2: Improved the regex, but it still fails if there are any newlines in the div.