First, the proof of concept. Check out the Rubular demo.
The regex goes like this:
/(<[^>]+\s+)(?:style\s*=\s*"(?!(?:|[^"]*[;\s])color\s*:[^";]*)(?!(?:|[^"]*[;\s])background-color\s*:[^";]*)[^"]*"|(style\s*=\s*")(?=(?:|[^"]*[;\s])(color\s*:[^";]*))?(?=(?:|[^"]*)(;))?(?=(?:|[^"]*[;\s])(background-color\s*:[^";]*))?[^"]*("))/i
Broken down, it means:
(<[^>]+\s+) Capture start tag to style attr ($1).
(?: CASE 1:
style\s*=\s*" Match style attribute.
(?! Negative lookahead assertion, meaning:
(?:|[^"]*[;\s]) If color found, go to CASE 2.
color\s*:[^";]*
)
(?!
(?:|[^"]*[;\s]) Negative lookahead assertion, meaning:
background-color\s*:[^";]* If background-color found, go to CASE 2.
)
[^"]*" Match the rest of the attribute.
| CASE 2:
(style\s*=\s*") Capture style attribute ($2).
(?= Positive lookahead.
(?:|[^"]*[;\s])
(color\s*:[^";]*) Capture color style ($3),
)? if it exists.
(?= Positive lookahead.
(?:|[^"]*)
(;) Capture semicolon ($4),
)? if it exists.
(?= Positive lookahead.
(?:|[^"]*[;\s])
(background-color\s*:[^";]*) Capture background-color style ($5),
)? if it exists.
[^"]*(") Match the rest of the attribute,
capturing the end-quote ($6).
)
Now, the replacement,
\1\2\3\4\5\6
should always construct what you expect to have left!
The trick here, in case it's not clear, is to put the "negative" case first, so that only if the negative case fails, the captures (such as the style attribute itself) would be populated, by, of course, the alternate case. Otherwise, the captures default to nothing, so not even the style attribute will show up.
To do this in JavaScript, do this:
htmlString = htmlString.replace(
/(<[^>]+\s+)(?:style\s*=\s*"(?!(?:|[^"]*[;\s])color\s*:[^";]*)(?!(?:|[^"]*[;\s])background-color\s*:[^";]*)[^"]*"|(style\s*=\s*")(?=(?:|[^"]*[;\s])(color\s*:[^";]*))?(?=(?:|[^"]*)(;))?(?=(?:|[^"]*[;\s])(background-color\s*:[^";]*))?[^"]*("))/gi,
function (match, $1, $2, $3, $4, $5, $6, offset, string) {
return $1 + ($2 ? $2 : '') + ($3 ? $3 + ';' : '')
+ ($5 ? $5 + ';' : '') + ($2 ? $6 : '');
}
);
Note that I'm doing this for fun, not because this is how this problem should be solved. Also, I'm aware that the semicolon-capture is hacky, but it's one way of doing it. And one can infer how to extend the whitelist of styles, looking at the breakdown above.