How to search for class names in a html file using regex?

Question

I was wondering myself if I want to get all classes names in a html file, what regex can I use? I don't understand the way I can mount it. I have this code:

HTML

<html>
    <div class="myFirstClass"></div>
    <div class="mySecondClass2"></div>
</html>

I want to know how to get:

myFirstClass
mySendCLass2

using regex... I tried to use class=".*" but it gets everything ever outside of the name.

First of all, what language are you using ? Have you considered using an HTML parser ? — HamZa, Aug 16 '13 at 13:19
[Mandatory Read](http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags/1732454#1732454) *(someone had to do it and it's still amazingly fun to read)* — Lieven Keersmaekers, Aug 16 '13 at 13:22
Adding this as a comment rather than just editing your post, but your HTML is invalid (the closing tags on the divs). — Chris McAtackney, Aug 16 '13 at 13:27
Questions asking for code must demonstrate a minimal understanding of the problem being solved. Include attempted solutions, why they didn't work, and the expected results. See: [Stack Overflow question checklist](http://meta.stackexchange.com/questions/156810/stack-overflow-question-checklist). Also every regex question should contain the language/flavor being used, there are some differences in syntax and capabilites between each flavor. **To all the guys who answered** with their "favorite" language: **do not answer** because it's **useless** for the OP and even for future visitors. — HamZa, Aug 16 '13 at 13:33
@Amith Why in the world did you change the tags to JS and jQuery ? — HamZa, Aug 16 '13 at 13:56
@LievenKeersmaekers: Please don't post links to that question, because they are not helpful to the reader, unless you follow it up with something that is an answer they can use. *You* know the point of the comment and that wall of text is that parsing HTML with regexes is a bad idea. However, to someone else who is asking, that is not at all clear. Worse, it doesn't point the reader to any useful solutions that *can* help parse HTML reliably. — Andy Lester, Aug 16 '13 at 15:28

score 2 · Answer 1 · answered Aug 16 '13 at 13:20

Your example of "class=".*" seems on-track, but the main problem with these is that the * is "greedy" - it will take as many characters as it can, probably then matching the last quotation mark in the line.

One option is to use \w instead of ., to only retrieve word characters. Depending on the language you're using, I would think an HTML parser might be a better option. Many languages have such libraries available.

NOTE: Unless your usage is pretty basic, a regex with \w would also need to account for space-seperated multiple class names.

score 2 · Answer 2 · answered Aug 16 '13 at 13:21

Don't use Regex for parsing HTML. If you're using .NET, you can use something like the HTML Agility Pack.

For your particular query, you could probably do something like;

var classNames = htmlDoc.DocumentNode
    .Descendants("div")
    .Select(x => x.Attributes["class"].Value);

Alexander van Oostenrijk · Answer 3 · 2013-08-16T13:35:40.780

2

Regular expressions are eager by default, so the ".* will get everything starting with " and ending with " for the longest string possible. What you need is for it to stop on the first matching ". Try this:

class=\"[^\"]*\"

edited Aug 16 '13 at 13:35

answered Aug 16 '13 at 13:22

Alexander van Oostenrijk

4,644
3
23
37

Amith · Accepted Answer · 2013-08-16T13:36:09.097

2

use map function

var classes= $("div").map(function() {
    return this;
}).get();
for(i=0;i<classes.length;i++){
    console.log($(classes[i]).attr('class'));
}

JSFiddle link of working code

http://jsfiddle.net/mkamithkumar/dLkkY/

edited Aug 16 '13 at 13:36

answered Aug 16 '13 at 13:29

Amith

1,424
1
10
22

How to search for class names in a html file using regex?

4 Answers4