1

For example i have such html:

<title>Ololo - text’s life</title><div class="page-wrap"><div class="ng-scope"><div class="modal custom article ng-scope in" id="new-article" aria-hidden="false" style="display: block;"><div class="modal-dialog first-modal-wrapper">< div class="modal-content"><div class="modal-body full long"><div class="form-group">olololo<ul style="color: rgb(85, 85, 85);background-color: rgb(255, 255, 255);"><li>texttext</li><li>Filter the events lists by host.</li><li>Create graphs for separate hosts and for the groups of hosts.</li></ul><p style="color: rgb(85, 85, 85);background-color: rgb(255, 255, 255);">bbcvbcvbcvbcvbcvbcvbcvb</p></div></div></div></div></div></div><title>cvbcbcvbcvbcvbccb</title><div class="page-wrap"></div></div>

how could i remove all style class id etc from such html?

i have such regex:

/<([a-z][a-z0-9]*)[^>]*?(\/?)>/i

what is wrong? how to delete all html attributes with the help of regex?

here is fiddle:

http://jsfiddle.net/qL4maxn0/1/

Huangism
  • 16,278
  • 7
  • 48
  • 74
brabertaser19
  • 5,678
  • 16
  • 78
  • 184

3 Answers3

6

First of all, I would advise you not to use regexes in this situation, they are not meant to parse tree-shaped structures like HTML.

If you however don't have a choice, I think for the requested problem, you can use a regex.

Looks to me like you forgot spaces, accents, etc. You can use the fact that the greater than > and less than < signs are not allowed as raw text.

/<\s*([a-z][a-z0-9]*)\s.*?>/gi

and call it with:

result = body.replace(regex, '<$1>')

For your given sample, it produces:

<title>Ololo - text’s life</title><div><div><div><div><div><div><div>olololo<ul><li>texttext</li><li>Filter the events lists by host.</li><li>Create graphs for separate hosts and for the groups of hosts.</li></ul><p>bbcvbcvbcvbcvbcvbcvbcvb</p></div></div></div></div></div></div><title>cvbcbcvbcvbcvbccb</title><div></div></div>
bwegs
  • 3,769
  • 2
  • 30
  • 33
Willem Van Onsem
  • 443,496
  • 30
  • 428
  • 555
4

You should not use regex here.

var html = '<title>Ololo - text’s life</title><div class="page-wrap"><div class="ng-scope"><div class="modal custom article ng-scope in" id="new-article" aria-hidden="false" style="display: block;"><div class="modal-dialog first-modal-wrapper"><div class="modal-content"><div class="modal-body full long">                        <div class="form-group">olololo<ul style="color: rgb(85, 85, 85);background-color: rgb(255, 255, 255);"><li>texttext</li><li>Filter the events lists by host.</li><li>Create graphs for separate hosts and for the groups of hosts.</li>                            </ul><p style="color: rgb(85, 85, 85);background-color: rgb(255, 255, 255);">bbcvbcvbcvbcvbcvbcvbcvb</p></div><div></div></div></div></div><title>cvbcbcvbcvbcvbccb</title><div class="page-wrap"></div></div>';
var div = document.createElement('div');
div.innerHTML = html;

function removeAllAttrs(element) {
    for (var i = element.attributes.length; i-- > 0;)
    element.removeAttributeNode(element.attributes[i]);
}

function removeAttributes(el) {
    var children = el.children;
    for (var i = 0; i < children.length; i++) {
        var child = children[i];
        removeAllAttrs(child);
        if (child.children.length) {
            removeAttributes(child);
        }
    }
}
removeAttributes(div);
console.log(div.innerHTML);

Working Fiddle

Source

Community
  • 1
  • 1
Mr_Green
  • 40,727
  • 45
  • 159
  • 271
  • I recommend using `documentFragment` instead of a `div`, but that's just an implementation detail. – zzzzBov Mar 09 '15 at 14:07
  • @zzzzBov Thanks for the info (_I was not aware of this_). Somehow I am not able to get the result if I do so. – Mr_Green Mar 09 '15 at 14:10
  • You can't just drop in [document fragments](https://developer.mozilla.org/en-US/docs/Web/API/DocumentFragment) as a replacement for `div` elements, but the idea is that the `div` adds context as a `body` element, where the document fragment is context free and may represent *any* fragment of HTML. – zzzzBov Mar 09 '15 at 14:14
  • I think using `div` is a vector for XSS, eg if someone tried ``. Assume that since a document fragment is context free, it wouldn't try to load the image (and hence fire the JS) as soon as you add the html to it? – James Thorpe Mar 09 '15 at 19:39
1

You're missing the g flag to make the replace global.

/<([a-z][a-z0-9]*)[^>]*?(\/?)>/ig

Also, if you're doing this for security purposes, look into using a proper HTML sanitizer : Sanitize/Rewrite HTML on the Client Side

Community
  • 1
  • 1
Mike Samuel
  • 118,113
  • 30
  • 216
  • 245
  • https://regex101.com/r/wR5wC4/1 - so it didn't get all classes styles etc, for example with multiple classes... what i do wrong? – brabertaser19 Mar 11 '15 at 09:35
  • 1
    @brabertaser1992, Yeah. I was pointing out why your original regex doesn't work on multiple tags. But there are other problems. For example, `">` will split through. See comments above on trying to parse HTML with regexs without reading the language specification first. – Mike Samuel Mar 11 '15 at 20:35