6

I know my question might look like a duplication for this question, but its not
I am trying to match a class name inside html text that comes from the server as a template using JavsScript RegExp and replace it with another class name. here what the code looks like :

<div class='a b c d'></div>
<!-- or -->
<div class="a b c d"></div>
<!-- There might be spaces after and before the = (the equal sign) -->

I want to match the class "b" for example
with the highest performance possible

here is a regular expression I used but it's not working in all cases, and I don't know why :

  var key = 'b';
  statRegex = new RegExp('(<[\w+ class="[\\w\\s]*?)\\b('+key+')\\b([\\w\\s]*")');
  html.replace( statRegex,'SomeOtherClass');// I may be mistake by the way I am replacing it here
Richard Garside
  • 87,839
  • 11
  • 80
  • 93
Evan Lévesque
  • 3,115
  • 7
  • 40
  • 61
  • I know it is not faster than dom manipulation, but I am getting a text from the server not a dom and I am working with a special framework – Evan Lévesque May 15 '13 at 07:42
  • Turning HTML into a DOM element is easy though ;-) – Ja͢ck May 15 '13 at 08:01
  • @AymanJitan You're being ignorant of the fact that HTML is not regular and therefor not a good fit for regular expressions. See http://htmlparsing.com/regexes.html. Using the DOM is obviously the best way to go. – Bart May 15 '13 at 08:13
  • @Bart I have widgets, each widget is made of many html templates, and inside each widget there is components with different states, the states are controlled by css classes, I need to parse the html and display the components and widgets in different states. its hard to explain though. I know that dom is much faster but it will not work in my case – Evan Lévesque May 15 '13 at 08:30
  • @AymanJitan I can imagine it's complex but you're missing the point I'm trying to make. It's not so much about speed. It's about getting the correct results now and in the future. A regex will very likely fail when the formatting of the HTML changes as the DOM will give you reliable results. – Bart May 15 '13 at 08:48
  • @Bart thanks and I really understand, the html here is formatted and minified using a special loader, all in the same way. – Evan Lévesque May 15 '13 at 11:03
  • I still don't get it. The replacement you want is super simple to make in DOM. After the change, use innerHTML to make a string again. – Ja͢ck May 15 '13 at 11:55

5 Answers5

5

Using a regex, this pattern should work for you:

var r = new RegExp("(<\\w+?\\s+?class\\s*=\\s*['\"][^'\"]*?\\b)" + key + "\\b", "i");
#                   Λ                                         Λ                  Λ
#                   |_________________________________________|                  |
#                           ____________|                                        |
# [Creating a backreference]                                                     |
# [which will be accessible]  [Using "i" makes the matching "case-insensitive".]_|
# [using $1 (see examples).]  [You can omit "i" for case-sensitive matching.   ]

E.g.

var oldClass = "b";
var newClass = "e";
var r = new RegExp("..." + oldClass + "...");

"<div class='a b c d'></div>".replace(r, "$1" + newClass);
    // ^-- returns: <div class='a e c d'></div>
"<div class=\"a b c d\"></div>".replace(r, "$1" + newClass);
    // ^-- returns: <div class="a e c d"></div>    
"<div class='abcd'></div>".replace(r, "$1" + newClass);
    // ^-- returns: <div class='abcd'></div>     // <-- NO change

NOTE:
For the above regex to work there must be no ' or " inside the class string.
I.e. <div class="a 'b' c d"... will NOT match.

gkalpak
  • 47,844
  • 8
  • 105
  • 118
  • That's a whole new question :) Please, edit your question to clearly state what you are tryign to achieve (I'll update my answer). – gkalpak May 15 '13 at 08:44
  • sorry for that, but this regular expression matches the entire element, I am after matching the class only – Evan Lévesque May 15 '13 at 08:45
3

Use the browser to your advantage:

var str = '<div class=\'a b c d\'></div>\
<!-- or -->\
<div class="a b c d"></div>\
<!-- There might be spaces after and before the = (the equal sign) -->';

var wrapper = document.createElement('div');
wrapper.innerHTML = str;

var elements = wrapper.getElementsByClassName('b');

if (elements.length) {
    // there are elements with class b
}

Demo

Btw, getElementsByClassName() is not very well supported in IE until version 9; check this answer for an alternative.

Community
  • 1
  • 1
Ja͢ck
  • 170,779
  • 38
  • 263
  • 309
  • +1 Nice one. The only downside is that IE < 9 does not support `getElementsByClassName`. – Bart May 15 '13 at 08:02
  • @Bart Doesn't support what exactly? Oh, you mean `getElementsByClassName`? – Ja͢ck May 15 '13 at 08:02
  • this can be the perfect solution, but not in my case. I am making some changes to the html before I inject it to the page, and dom will not help me in the framework I am working with. – Evan Lévesque May 15 '13 at 08:03
  • @AymanJitan So make the changes in DOM? It's not very clear how the framework is stopping you from doing this. – Ja͢ck May 15 '13 at 08:04
  • 1
    @AymanJitan Make changes and get `wrapper.innerHTML`. Nothing simpler then that. – Bart May 15 '13 at 08:06
3

Test it here: https://regex101.com/r/vnOFjm/1

regexp: (?:class|className)=(?:["']\W+\s*(?:\w+)\()?["']([^'"]+)['"]

const regex = /(?:class|className)=(?:["']\W+\s*(?:\w+)\()?["']([^'"]+)['"]/gmi;
const str = `<div id="content" class="container">

<div style="overflow:hidden;margin-top:30px">
  <div style="width:300px;height:250px;float:left">
<ins class="adsbygoogle turbo" style="display:inline-block !important;width:300px;min-height:250px; display: none !important;" data-ad-client="ca-pub-1904398025977193" data-ad-slot="4723729075" data-color-link="2244BB" qgdsrhu="" hidden=""></ins>


<img src="http://static.teleman.pl/images/pixel.gif?show,753804,20160812" alt="" width="0" height="0" hidden="" style="display: none !important;">
</div>`;

let m;

while ((m = regex.exec(str)) !== null) {
    // This is necessary to avoid infinite loops with zero-width matches
    if (m.index === regex.lastIndex) {
        regex.lastIndex++;
    }
    
    // The result can be accessed through the `m`-variable.
    m.forEach((match, groupIndex) => {
        console.log(`Found match, group ${groupIndex}: ${match}`);
    });
}
FDisk
  • 8,493
  • 2
  • 47
  • 52
1

Regular expressions are not a good fit for parsing HTML. HTML is not regular.

jQuery can be a very good fit here.

var html = 'Your HTML here...';

$('<div>' + html + '</div>').find('[class~="b"]').each(function () {
    console.log(this);
});

The selector [class~="b"] will select any element that has a class attribute containing the word b. The initial HTML is wrapped inside a div to make the find method work properly.

Bart
  • 17,070
  • 5
  • 61
  • 80
-1

This may not be a solution for you, but if you aren't set on using a full regex match, you could do (assuming your examples are representative of the data you will be parsing) :

function hasTheClass(html_string, classname) {
    //!!~ turns -1 into false, and anything else into true. 
    return !!~html_string.split("=")[1].split(/[\'\"]/)[1].split(" ").indexOf(classname);
}

hasTheClass("<div class='a b c d'></div>", 'b'); //returns true
dave
  • 62,300
  • 5
  • 72
  • 93