1

I was wondering myself if I want to get all classes names in a html file, what regex can I use? I don't understand the way I can mount it. I have this code:

HTML

<html>
    <div class="myFirstClass"></div>
    <div class="mySecondClass2"></div>
</html>

I want to know how to get:

myFirstClass
mySendCLass2

using regex... I tried to use class=".*" but it gets everything ever outside of the name.

Andy Lester
  • 91,102
  • 13
  • 100
  • 152
tiagomac
  • 461
  • 1
  • 7
  • 18
  • rather than regex use xpath to query dom – Amith Aug 16 '13 at 13:17
  • or use a framework like jquery to ask for classes – 75inchpianist Aug 16 '13 at 13:18
  • 1
    `` – SLaks Aug 16 '13 at 13:18
  • First of all, what language are you using ? Have you considered using an HTML parser ? – HamZa Aug 16 '13 at 13:19
  • [Mandatory Read](http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags/1732454#1732454) *(someone had to do it and it's still amazingly fun to read)* – Lieven Keersmaekers Aug 16 '13 at 13:22
  • Adding this as a comment rather than just editing your post, but your HTML is invalid (the closing tags on the divs). – Chris McAtackney Aug 16 '13 at 13:27
  • Questions asking for code must demonstrate a minimal understanding of the problem being solved. Include attempted solutions, why they didn't work, and the expected results. See: [Stack Overflow question checklist](http://meta.stackexchange.com/questions/156810/stack-overflow-question-checklist). Also every regex question should contain the language/flavor being used, there are some differences in syntax and capabilites between each flavor. **To all the guys who answered** with their "favorite" language: **do not answer** because it's **useless** for the OP and even for future visitors. – HamZa Aug 16 '13 at 13:33
  • 3
    @Amith Why in the world did you change the tags to JS and jQuery ? – HamZa Aug 16 '13 at 13:56
  • @LievenKeersmaekers: Please don't post links to that question, because they are not helpful to the reader, unless you follow it up with something that is an answer they can use. *You* know the point of the comment and that wall of text is that parsing HTML with regexes is a bad idea. However, to someone else who is asking, that is not at all clear. Worse, it doesn't point the reader to any useful solutions that *can* help parse HTML reliably. – Andy Lester Aug 16 '13 at 15:28

4 Answers4

2

Your example of "class=".*" seems on-track, but the main problem with these is that the * is "greedy" - it will take as many characters as it can, probably then matching the last quotation mark in the line.

One option is to use \w instead of ., to only retrieve word characters. Depending on the language you're using, I would think an HTML parser might be a better option. Many languages have such libraries available.

NOTE: Unless your usage is pretty basic, a regex with \w would also need to account for space-seperated multiple class names.

Katana314
  • 8,429
  • 2
  • 28
  • 36
2

Don't use Regex for parsing HTML. If you're using .NET, you can use something like the HTML Agility Pack.

For your particular query, you could probably do something like;

var classNames = htmlDoc.DocumentNode
    .Descendants("div")
    .Select(x => x.Attributes["class"].Value);
Chris McAtackney
  • 5,192
  • 8
  • 45
  • 69
2

Regular expressions are eager by default, so the ".* will get everything starting with " and ending with " for the longest string possible. What you need is for it to stop on the first matching ". Try this:

class=\"[^\"]*\"
Alexander van Oostenrijk
  • 4,644
  • 3
  • 23
  • 37
2

use map function

var classes= $("div").map(function() {
    return this;
}).get();
for(i=0;i<classes.length;i++){
    console.log($(classes[i]).attr('class'));
}

JSFiddle link of working code

http://jsfiddle.net/mkamithkumar/dLkkY/

Amith
  • 1,424
  • 1
  • 10
  • 22