2

just trying to remove some elements with preg_replace but can't get it to work consistently. I would like to remove an element with matching class. Problem is the element may have an ID or several classes.

ie the element could be

<div id="me1" class="removeMe">remove me and my parent</div> 

or

<div id="me1" class="removeMe" style="display:none">remove me and my parent</div>

is it possible to do this?

any help appreciated! Dan.

Deduplicator
  • 44,692
  • 7
  • 66
  • 118
v3nt
  • 2,845
  • 6
  • 36
  • 50
  • 3
    You do NOT want to do html manipulations with regexes. A better method would be to load the html into the DOM system and remove the div nodes from there. Otherwise you're very likely to mangle the document completely, as regexes can't handle HTML properly with a 100% accuracy guarantee. – Marc B Mar 01 '11 at 18:05
  • http://stackoverflow.com/questions/3368771/regex-match-html-tag-only-if-it-contains-a-specific-class-id – Brandon Frohbieter Mar 01 '11 at 18:06
  • You could do this easily with jQuery .hasClass() – Brandon Frohbieter Mar 01 '11 at 18:07
  • @Marc B - I would suggest making that an answer this it would be the correct approach. – John Cartwright Mar 01 '11 at 18:09
  • possible duplicate of [Replace UL tags with specific class](http://stackoverflow.com/questions/3804339/replace-ul-tags-with-specific-class) – Gordon Mar 01 '11 at 18:12
  • *(related)* http://stackoverflow.com/questions/3820666/regular-expression-for-grabbing-the-href-attribute-of-an-a-element/3820783#3820783 – Gordon Mar 01 '11 at 18:16

3 Answers3

4

I agree with MarcB. Overall, it's better to use a DOM when manipulating HTML. But here is a regex based on smottt's answer that might work:

$html = preg_replace('~<div([^>]*)(class\\s*=\\s*["\']removeMe["\'])([^>]*)>(.*?)</div>~i', '', $html);
  • Use [^>]* and [^<]* instead of .*. In my testing, .*? doesn't work. If a non-matching div comes before a matching div, it will match the first div, everything in between, and the last div. For example, it incorrectly matches against this entire string: <div></div><b>hello</b><div class="removeMe">bar</div>
  • Take into account the fact that you can use single quotes with HTML attributes.
  • Also remember that there can be whitespace around the equals sign.
  • You should use the "m" modifier too so that it takes line breaks into account (see this page).

I added parenthesis for clarity, but they aren't needed. Let me know if this works or not.

EDIT: Actually, nevermind, the "m" modifier won't do anything. EDIT2: Improved the regex, but it still fails if there are any newlines in the div.

Michael
  • 34,873
  • 17
  • 75
  • 109
  • thanx michael - everyon'es right about the dom but i'm just trying to modify an existing function in wordpress using preg-replace. And i really just want to understand this better. None of what anyone has posted has worked?! Not sure if my environment can effect it but this works for images: $content = preg_replace("/]+\>/i", "", $content); just need to add the class match bit and change it to a div. I just can't get it and tried so many versions! – v3nt Mar 02 '11 at 13:29
  • What does the HTML that you are working with look like? In my testing, my regex worked well. Are you sure you're not modifying it when inserting it into your code? For example, the regex cannot be enclosed in "/" characters like most regexes in PHP are because there is a "/" character in the regex itself. Also, in your "img" tag regex, you have a backslash right before the ">"?? What's that about? – Michael Mar 04 '11 at 14:12
  • Actually, it was having problems if there were any tags nested inside of the div. The edited version above works a little better, but fails for some reason if there are any newlines inside the div. I replaced the "([^<]*)" inside the div body with "(.*?)". – Michael Mar 04 '11 at 14:39
2

While this is still doable with regular expression, it's much simpler with e.g. QueryPath:

print qp($html)->find(".removeMe")->parent()->remove()->writeHTML();
mario
  • 144,265
  • 20
  • 237
  • 291
1

With preg_replace:

preg_replace('~<div([^>]*)class="(.*?)gallery(.*?)">(.*?)</div>~im', '', $html);
smottt
  • 3,272
  • 11
  • 37
  • 44
  • 1
    This does not take into account multiple classes. Plus, you want to use a non-greedy pattern by changing your .* to .*? – John Cartwright Mar 01 '11 at 18:09
  • It should use `[^>]*` and the ungreedy `.*?` rather, else it will match too much. – mario Mar 01 '11 at 18:09
  • ok - thanks for the responses - got this now but its not working? $content = preg_replace('~
    ~i', '', $content); Basically want to remove a gallery element before it tries to load anything. Would rather try and avoid jquery for this if pos. Example of the element is . thanks, D.
    – v3nt Mar 01 '11 at 18:17
  • Thanks for correcting me. @Daniel, the regex should now be ~
    ]*)class="(.*?)gallery(.*?)">(.*?)
    ~i if i'm not mistaken.
    – smottt Mar 01 '11 at 18:58
  • You may also need to add multiline support: – Corey Ballou Aug 28 '12 at 13:31