15

I'm looking for a regex pattern that will look for an attribute within an HTML tag. Specifically, I'd like to find all instances of ...

style=""

... and remove it from the HTML tag that it is contained within. Obviously this would include anything contained with the double quotes as well.

I'm using Classic ASP to do this. I already have a function setup for a different regex pattern that looks for all HTML tags in a string and removes them. It works great. But now I just need another pattern for specifically removing all of the style attributes.

Any help would be greatly appreciated.

jkupczak
  • 2,891
  • 8
  • 33
  • 55
  • [You shouldn't try to parse HTML with regex](http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags/1732454#1732454) – Bohemian Sep 23 '11 at 12:50
  • 1
    Use a parser. The "L" in "HTML" is the clue - "language". You might be lucky if your HTML is strict, but to be safe you'll need to parse it like a language - token by token. – Bohemian Sep 23 '11 at 13:25
  • Use a parser, got it. I know even less about parsers than I do about RegEx's unfortunately. I'm going to do some research on them, but in the meantime, Jason Gennaro came up with a RegEx that gets the job done. Thanks. – jkupczak Sep 23 '11 at 14:30
  • Generally, you don't need to implement your own parser, you need to library. When you use a nicely written one, it's much easier this way. – d33tah Sep 03 '14 at 17:21

9 Answers9

50

Perhaps a simpler expression is

 style="[^\"]*"

so everything between the double quotes except a double quote.

CpILL
  • 6,169
  • 5
  • 38
  • 37
25

I think this might do it:

/style="[a-zA-Z0-9:;\.\s\(\)\-\,]*"/gi

You could also put these in capturing groups, if you wanted to replace some parts only

/(style=")([a-zA-Z0-9:;\.\s\(\)\-\,]*)(")/gi

Working Example: http://regexr.com?2up30

Jason Gennaro
  • 34,535
  • 8
  • 65
  • 86
  • Close, but it's not taking hyphens into account. It stops matching as soon as it reaches one. Like text-align and background-color for example. http://regexr.com?2up1m – jkupczak Sep 23 '11 at 13:28
  • Sorry @jimmykup! Forgot about the hyphen. I've edited the answer above with `\-`. Example updated too. Check it now. Should work. – Jason Gennaro Sep 23 '11 at 13:29
  • Works great! Noticed that it was breaking on commas too though. Like font-family: Arial, Helvtica; but I know enough about RegEx to figure out how to add that in there as well. Thanks! – jkupczak Sep 23 '11 at 14:31
  • Excellent @jimmykup! Glad it worked out. I've edited the answer to include the fix for the commas. Thx for catching. – Jason Gennaro Sep 23 '11 at 15:06
  • 2
    I would also recommend adding `\#` to capture styles that include color-related rules, like `background: #ff0000` – ThoughtProcess Oct 14 '13 at 17:44
  • `style="[a-zA-Z0-9:;\.\s\(\)\-\,\'\#]*"` this did it for me, it was missing the single quote character and the # sign. – Peter May 09 '15 at 16:19
  • In my situation (trying to strip styles from email newsletter where a CSS inliner had been used) I found this to be a bit more helpful because it also captures some common symbols used in inline css (#, %, ' etc.): `/(style=")([a-zA-Z0-9:;\.\s\(\)\-\,\!\#\'\%]*)(")/gi` – Nicholas Byfleet Feb 08 '16 at 20:25
  • Fails for attributes surrounded with *single-quotes*:`style='abc'` – vsync Nov 10 '20 at 20:43
4

Try this, it will replace style attribute and it's value completely

const regex = /style="(.*?)"/gm;
const str = `<div class="frame" style="font-family: Monaco, Consolas, &quot;Courier New&quot;, monospace; font-size: 12px; background-color: rgb(245, 245, 245);">some text</div>`;
const subst = ``;

// The substituted value will be contained in the result variable
const result = str.replace(regex, subst);

console.log('Substitution result: ', result);
Sachin Gaur
  • 679
  • 7
  • 6
  • 2
    Do `/\s*style=(["'])(.*?)\1/gmi` to be more flexible - supports apostrophes, removes extra blank, ignores case. – xmedeko Oct 14 '19 at 15:41
4

In visual studio find and replace, this is what i do to remove style and class attributes:

\s*style|class="[^"]*\n*"

This removes the beginning spaces and style and class attributes. It looks for anything except a double quote in these attributes and then newline(s), in case if it spreads out to new lines, and lastly adds the closing double quote

PersyJack
  • 1,736
  • 1
  • 21
  • 32
1

I tried Jason Gennaro's regular expression and slightly modified it

/style="[a-zA-Z0-9:;&\."\s\(\)\-\,]*|\\/ig

This regular expression captures some specific cases with &quot inside the string for example

 <div class="frame" style="font-family: Monaco, Consolas, &quot;Courier New&quot;, monospace; font-size: 12px; background-color: rgb(245, 245, 245);">some text</div>
0

This expression work for me:

style=".+"/ig
Dmitry Matrosov
  • 412
  • 4
  • 5
0

The following expression should remove anything within a style attribute (including the attribute itself); crucially this includes whether the attribute uses double or single quotes:

/style=("|')(?:[^\1\\]|\\.)+?\1/gi

This splits the capture groups so that they can match on single or double-quotes, and then capture anything in between, including URL-encoded characters & line breaks, whilst leaving other attributes (like classes or names) intact.

Tested here: https://regexr.com/4rovf

indextwo
  • 5,535
  • 5
  • 48
  • 63
  • In JavaScript back reference does not work inside a character class, \1 === \x01. try swap the order: `/style=("|')(?:\\\1|.)+?\1/gi` – Lauri Feb 06 '23 at 15:06
0

try it:

(style|class)=(["'])(.*?)(["'])
foad abdollahi
  • 1,733
  • 14
  • 32
0

This works with perl. Maybe you need to change the regex to match ASP rules a little bit but it should work for any tag.

$file=~ s/(<\s*[a-z][a-z0-9]*.*\s)(style\s*=\s*".*?")([^<>]*>)/$1 $3/sig;

Where line is an html file.

Also this is in .net C#

      string resultString = null;
      string subjectString = "<html style=\"something\"> ";

      resultString = Regex.Replace(subjectString, @"(<\s*[a-z][a-z0-9]*.*\s)(style\s*=\s*"".*?"")([^<>]*>)", "$1 $3", RegexOptions.Singleline | RegexOptions.IgnoreCase);

Result : <html >

FailedDev
  • 26,680
  • 9
  • 53
  • 73