I found out it's very difficult to build a single regex that, in a single pass, remove simultaneously class and style attributes inside a tag. That's because we don't know where this attributes will appear, together with other attributes inside the tag (supposing that we want to preserve the other ones). However, we can achieve that, splitting this task in two simpler search and replace operations: one for the class attribute and another for the style attribute.
To capture the first part of a div containing a class attribute, with one or more values enclosed in double quotes, the regex is as follows:
(<div\s+)([^>]*)(class\s*=\s*\"[^\">]*\")(\s|/|>)
The same code modified for single quotes:
(<div\s+)([^>]*)(class\s*=\s*\'[^\'>]*\')(\s|/|>)
Or no quotes:
(<div\s+)([^>]*)(class\s*=\s*[^\"\'=/>\s]+)(\s|/|>)
The captured string must then be replaced by the first, second and fourth capture group which, in PHP preg_replace() code, is represented by the string $1$2$4
.
To eliminate a style attribute, instead a class one, just replace the substring class
by the substring style
in the regex. To eliminate these attributes in any tag (not only divs), replace the substring div
by the substring [a-z][a-z0-9]*
in the regex
Note: the regex above will not eliminate class or style attributes with syntax errors. Example: class="xxxxx (missing a quote after the value), class='xxxxx'' (excess of quotes after the value), class="xxxx"title="yyyy" (no space between attributes), and so on.
Short explanation:
<div\s+ # beginning of the div tag, followed by one or more whitespaces
[^>]* # any set of attributes before the class (optional)
class\s*=\s*\"[^\">]*\" # class attribute, with optional whitespaces
\s|/|> # one of these characters always follows the end of an attribute