1

Original code:

<div style="height:100px;" id="main" >
<a href="133"></a>
<blockquote color="123">

after replace

<div style="height:100px;" >
<a></a>
<blockquote>

i try the regex but its not work

preg_replace('#<(div|span|a|img|ul|li|blockquote).*( style=".*")?(.*)>#Us', '<$1$2>', $content);

anyone can help me to solve this problem? thank you!!

Quentin
  • 914,110
  • 126
  • 1,211
  • 1,335
  • the `( style=".*")` will meet from `style="` to the last appearance of `"` inside the tag. – hjpotter92 Apr 15 '12 at 15:58
  • yes. its what i face. im already use Us control avoid that. what should it be? – user1334715 Apr 15 '12 at 15:59
  • I'm not sure, but does php-regex has `.+` matching? – hjpotter92 Apr 15 '12 at 16:01
  • yes it has `.+`. but the last `?` make me confused. if `?` is here it cant match anything. if `?` is removed, it cant move the tag which havent style attributes <(div|span|a|img|ul|li|blockquote).*(style=".+")`?`.*> – user1334715 Apr 15 '12 at 16:05
  • Your pattern breaks very easily: `` (the HTML does not make sense, but it can be parsed). – Rob W Apr 15 '12 at 16:19
  • [Don't try to use regular expressions](http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags) for this. – Quentin Apr 15 '12 at 17:31
  • possible duplicate of [How to parse and process HTML with PHP?](http://stackoverflow.com/q/3577641/) – outis Apr 25 '12 at 19:01

2 Answers2

1

Not recommending regex, but this probably works.

Edit: fixed option group, was in the wrong place.

Test case here: http://ideone.com/vRk1u

'~
( < (?:div|span|a|img|ul|li|blockquote) (?=\s) )         # 1
   (?= 
     (?:
        (?:[^>"\']|"[^"]*"|\'[^\']*\')*? 
        (                                                      # 2
          \s  style \s*=
          (?: (?>  \s* ([\'"]) \s* (?:(?!\g{-1}) .)* \s* \g{-1} )  #3
            | (?>  (?!\s*[\'"]) \s* [^\s>]* (?=\s|>) )
          )
        )
     )?
   )
  \s* (?:".*?"|\'.*?\'|[^>]*?)+ 
( /?> )                                                  # 4
~xs'
0

I do not have PHP available at this moment, so I'll write you a regex on Javascript, and you can port it easily. (I'll use the RegExp object so the regex will already be quoted for you)

'<div style="height:100px;" id="main" >'.replace(new RegExp('<([a-zA-Z0-9]*)(.*([ \t\r\n]style[ \t\r\n]*=[ \t\r\n]*(("[^"]*")|(\'[^\']*\'))))*[^>]*>'), '<$1$3>')
 == <div style="height:100px;">

'<div style=\'height:100px;\' id="main" >'.replace(new RegExp('<([a-zA-Z0-9]*)(.*([ \t\r\n]style[ \t\r\n]*=[ \t\r\n]*(("[^"]*")|(\'[^\']*\'))))*[^>]*>'), '<$1$3>')
 == <div style='height:100px;'>

'<div style="height:100px;">'.replace(new RegExp('<([a-zA-Z0-9]*)(.*([ \t\r\n]style[ \t\r\n]*=[ \t\r\n]*(("[^"]*")|(\'[^\']*\'))))*[^>]*>'), '<$1$3>')
 == <div style="height:100px;">

'<div dfg dfg fdg>'.replace(new RegExp('<([a-zA-Z0-9]*)(.*([ \t\r\n]style[ \t\r\n]*=[ \t\r\n]*(("[^"]*")|(\'[^\']*\'))))*[^>]*>'), '<$1$3>')
 == <div>

'<div>'.replace(new RegExp('<([a-zA-Z0-9]*)(.*([ \t\r\n]style[ \t\r\n]*=[ \t\r\n]*(("[^"]*")|(\'[^\']*\'))))*[^>]*>'), '<$1$3>')
 == <div>

So its one regex which takes into account most possible situations.

Does this answer your question?

(Btw, you can replace those [ \t\r\n] with the whitespace shorthand if php's regex supports it and it works in multiline mode)

daniel.gindi
  • 3,457
  • 1
  • 30
  • 36