0

I have a javascript variable that contains the contents of a HTML page. I would like to remove a inline <style type="text/css"> ... </style> from this. I asked before and it was suggested that I add this to the DOM.

Is there a simpler way that I could remove this using a regular expression. I need to match <style> as a start and </style> as a finish. I heard about regex but not even sure if this can be used with javascript.

Samantha J T Star
  • 30,952
  • 84
  • 245
  • 427
  • 1
    javascript has its own regex for sure, but why don't you make a or multiple CSS class(also make them more reusable) contains everything in the , therefore you can remove them easily by jQuery removeClass() function – Venzentx Jun 19 '14 at 15:12
  • If all else fails, you can always use substring to remove it – Huangism Jun 19 '14 at 15:12
  • Could there be more than 1 style declaration in your value? – Ingmars Jun 19 '14 at 15:12
  • 1
    I'll just leave this here: [link](http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags/1732454#1732454) – jupenur Jun 19 '14 at 15:24

4 Answers4

2

Ingmars has the right idea, except it's missing an important question mark, some additional HTML/XML possibilities (such as whitespace allowed after the tag name in both cases, and attributes in the first case), and also replacing it with a message (I'm assuming that you just wanted to delete it completely).

This will work except if attributes contain ">" which is a calculated risk. The code is written given that htmlString is the actual variable that you have containing the HTML document.

htmlString = htmlString.replace(/<style\b[^<>]*>[\s\S]*?<\/style\s*>/gi, '');
Joseph Myers
  • 6,434
  • 27
  • 36
  • Your first `*` still looks a bit too greedy. And it'll match ` – jupenur Jun 19 '14 at 15:41
  • 1
    It's OK for the first `[^<>]` to be greedy, because there is no chance for it to get beyond the end of the tag since both `>` and `<` are not allowed (the second one is also illegal). As far as matching your example, there is no tag name beginning with the substring style other than style, so we are safe in isolating the matching of style tags. You are right that no validation is being done of the HTML in the document, but it is well known that such a task is impossible in regular expressions as they are. – Joseph Myers Jun 19 '14 at 15:44
  • This: "such a task is impossible in regular expressions". See the comment I left on the question? – jupenur Jun 19 '14 at 15:47
  • @JosephMyers - I just rechecked and there is – Samantha J T Star Jun 19 '14 at 15:48
  • @SamanthaJ Yes, my version will also check for this (as well as any other attributes there might be like `media`. – Joseph Myers Jun 19 '14 at 15:49
  • @JosephMyers oh and regarding the validity of tag names... Have you heard about [web components](http://w3c.github.io/webcomponents/spec/custom/)? – jupenur Jun 19 '14 at 15:50
  • Yes, things like `pro` and `con`, and whatever you want. If you are worried about those appearing in your HTML, then simply put a word break in the expression after style to guarantee the tag name itself is nothing but style. I'll add that to my code. – Joseph Myers Jun 19 '14 at 15:52
1

If it's just one set of <style> tags, then a Javascript Reg Exp would work just fine:

var re = /(<style\b[^>]*>)[^<>]*(<\/style>)/i; // To remove ALL style tags, change the i at the end to gi.
var html = "!<DOCTYPE html>..."; // Your HTML string;

html = html.replace(re, "");

This solution isn't practical where you want to target specific <style> tags though (i.e. You can only remove the first match, or all matches).

Matt
  • 3,079
  • 4
  • 30
  • 36
  • can you explain you mention. You can only remove the first match or all matches. In your example would it remove the first or all ? – Samantha J T Star Jun 19 '14 at 15:36
  • 2
    What about something like ``? See the `>` there? – jupenur Jun 19 '14 at 15:38
  • 1
    Sure. Regular Expressions will return at the first _expression_ that they match, **unless** you specify a `g` (or `gi`) at the end of the statement. If a `g` is specified, it will continue even after the first match and find everything in the string that matches. – Matt Jun 19 '14 at 15:39
  • 1
    @jupenur Well spotted. Didn't consider that character. – Matt Jun 19 '14 at 15:41
  • @Matt - I just rechecked and there is – Samantha J T Star Jun 19 '14 at 15:48
1

Simple regex which will wipe it with no regrets:

var a = 'aaaa <style type="text/css" favouriteAnimal="horse">style</StYlE> bbbbb <styLE>another style</STyle> cccc';
var b = a.replace( /<style[\s\S]*?>[\s\S]*?<\/style>/gi, '' );
console.log( b );

EDIT: updating my answer to handle current question specifics.

Ingmars
  • 998
  • 5
  • 10
  • Your regexp needs to be lazy `[\s\S]*?` or you are going to gobble up everything from the first stylesheet on the page until the end of the last one. One some web pages this will devour the entire web page as well, because they have stylesheets at the top and at the bottom. – Joseph Myers Jun 19 '14 at 15:32
  • 1
    @JosephMyers: good catch, I've updated my code, and learned a bit myself. Thanks! – Ingmars Jun 19 '14 at 15:41
  • 1
    Thanks. In fact, my code isn't perfect, either. @jupenur has a good point at his link, that there are always failure cases when trying to do anything with HTML without actually parsing it, and parsing it is impossible with regular expressions. – Joseph Myers Jun 19 '14 at 15:47
0

Following the advice of bobince (as recommended by jupenur), use an XML parser. Then you can find all <style> tags, remove them, and retrieve the HTML. It'll work every time. Here's an example:

var im = document.implementation;
var doc = 'createHTMLDocument' in im ?
    im.createHTMLDocument('') : new ActiveXObject("htmlfile");
if(!doc.body)
    doc.write('<body></body>');
doc.body.innerHTML = '<p><style type="text/css"></style></p><p>Hii</p>';
var temp=doc.getElementsByTagName('style');
while(temp.length)
    temp[0].parentNode.removeChild(temp[0]);
console.log(doc.body.innerHTML); // '<p></p><p>Hii</p>'

If you don't do that, you could unintentionally remove stuff from other tags, like in comments or very necessary text from script tags (ie. $('body').append('<style>p { color: blue; }</style>');).

May the <center> tag hold.

Community
  • 1
  • 1
Pluto
  • 2,900
  • 27
  • 38