13

I am trying to write a regex that matches a valid CSS class name structure. I have this so far:

$pattern = "([A-Za-z]*\.[A-Za-z]+\s*{)";

$regex = preg_match_all($pattern, $html, $matches);

However, a class name can be in the following formats that my regex won't match:

p.my_class{
}
p.thisclas45{
}

These are just some cases, I've looked around to find the rules of how you can name a class in a style block but couldn't find anything. Anyone know where the rules for the class naming conventions are?

Are there any more cases that I need to consider? What regex would you use to match a class name?

I have already narrowed it down to a style block using the PHP DOM Document class.

binaryLV
  • 9,002
  • 2
  • 40
  • 42
Abs
  • 56,052
  • 101
  • 275
  • 409
  • Where are your delimiters? And did you mean CSS? – Lightness Races in Orbit Jun 13 '11 at 10:12
  • In CSS, class names begin with a dot. Your regexp does not match so. :-? – Álvaro González Jun 13 '11 at 10:14
  • @Tomalak - sorry I removed them for some reason, its a `#`. @ Álvaro - it does match it as I have a `*` which is 0 or more characters infront of the `.`. – Abs Jun 13 '11 at 10:17
  • @Álvaro: I don't think he strictly means class names, either. Terminology fail all round, here. – Lightness Races in Orbit Jun 13 '11 at 10:17
  • @Abs: Edit them in then? – Lightness Races in Orbit Jun 13 '11 at 10:18
  • 2
    @Abs: Are you after: (a) classname [you don't have this right atm]; (b) selector including classname [looks about right!]; or (c) any selector [you're missing loads of cases]? – Lightness Races in Orbit Jun 13 '11 at 10:20
  • @Tomalak - your right I am missing loads of cases, thats why I was hoping for a link similar to the one Peter gave, but he also gave a valid regex which works for me! – Abs Jun 13 '11 at 10:23
  • It would help to know what you are trying to accomplish with this regex. What will the HTML markup look like? What will you use the results for? Are you trying to make your search pattern work exactly like CSS selectors in the browser? There are many valid combinations of class and element names in those, and it is not likely you'll be able match all of those in just one regex pattern. – weltraumpirat Jun 13 '11 at 10:23
  • @Abs - I just don't get it. If it's supposed to allow both valid and invalid names, what's the whole purpose of the regexp? – Álvaro González Jun 13 '11 at 10:23
  • @Álvaro - It isn't supposed to match invalid class names. @weltraumpirat - I will be parsing some HTML from users and checking if they have used class names, if so, get those class names and thats it. – Abs Jun 13 '11 at 10:26
  • 1
    `(` and `)` (as well as `{}`, `[]` and `<>`) may be used as delimiters, i.e., `"([A-Za-z]*\.[A-Za-z]+\s*{)"` is fully valid pattern. If another delimiter is used, there's no need to put whole pattern in `()`, i.e., it can be either `(something)` or `#something#`, and there's no need to write `#(something)#`, as you would use whole pattern as *subpattern* in such case. – binaryLV Jun 13 '11 at 10:30
  • @binaryLV - thank you for that explanation. I needed to know that. – Abs Jun 13 '11 at 10:32
  • @Abs, it's OK. To be honest, I have never used and have never seen others using `()`, `{}`, `[]` or `<>` in practice. I've just read once about such possibility ([docs](http://lv.php.net/manual/en/regexp.reference.delimiters.php) about pattern delimiters in PHP). – binaryLV Jun 13 '11 at 10:41
  • @binaryLV: But he's using `(` and `)` as a capture group, not as delimiters. Now he has no capture group. This may not be a problem given that the group encases the entire expression. – Lightness Races in Orbit Jun 13 '11 at 13:03
  • @Tomalak Geret'kal, he didn't say that brackets were used as a capture group. He only said that `#` were removed for some reason, and obviously code was running *fine*, so, with `#` as delimiters, brackets actually were NOT used in any way. I don't see anything in pattern that would indicate using them as a capture group, and there's no need to pollute `$matches` with something that would be identical to `$matches[0]`. – binaryLV Jun 13 '11 at 13:17
  • @binaryLV: What? In his original expression `"#([A-Za-z]*\.[A-Za-z]+\s*{)#"`, the capture group is _plain to see_. (As I said, though, I agree that it's a redundant one. Without the `#` delimiters the parentheses now have taken on a _different_ function as delimiters themselves rather than a redundant capture group.) – Lightness Races in Orbit Jun 13 '11 at 13:17
  • @Tomalak, in [revision 1 of 4 of this question](http://stackoverflow.com/revisions/6329090/1) I see `"([A-Za-z]*\.[A-Za-z]+\s*{)"` as being the original value of `$pattern` here, on SO. – binaryLV Jun 13 '11 at 13:26
  • @binaryLV: Read the OP's first comment. "sorry I removed them for some reason, its a `#`". His code has `#`. He lost it when writing on SO. – Lightness Races in Orbit Jun 13 '11 at 13:27
  • @Tomalak, why do you think that it was lost when writing on SO? Maybe it was deleted 2 days ago, when trying to write working expression in his favorite code editor or IDE? He only wrote that he "*removed them for some reason*" without specifying, when and why it was done and what was original pattern. Original pattern that was posted here, on SO, had `()` as working delimiters. See revision #1. – binaryLV Jun 13 '11 at 13:34
  • @binaryLV: Regardless, the OP clearly intended for them to be present. Taking an initial revision of an SO question as absolute gospel seems... odd. – Lightness Races in Orbit Jun 13 '11 at 13:35

3 Answers3

22

Have a look at http://www.w3.org/TR/CSS21/grammar.html#scanner

According to this grammar and the post Which characters are valid in CSS class names/selectors? this should be the right pattern to scan for css classes:

\.-?[_a-zA-Z]+[_a-zA-Z0-9-]*\s*\{

Note: Tag names are not required as prefix for classes in css. Just .hello { border: 1; } is also valid.

Community
  • 1
  • 1
Peter
  • 3,916
  • 1
  • 22
  • 43
  • wow that was fast! I tested the above with a few variations including invalid class names and it works. Thank you very much! – Abs Jun 13 '11 at 10:21
  • 3
    What about using `[\w-]` instead of `[_a-zA-Z0-9-]`? `\w` matches any *word* character, i.e. *any letter or digit or the underscore character* (from [docs](http://lv.php.net/manual/en/regexp.reference.escape.php)). – binaryLV Jun 13 '11 at 10:36
  • `s/prefix for classes/prefix for selectors/` – Lightness Races in Orbit Jun 13 '11 at 10:54
  • 1
    That's not going to match `.modern-trade`. It should be `\.-?[_a-zA-Z\-]+[\w\-]*\s*\{`. – Gajus Jan 08 '13 at 23:52
  • Thanks, for me this will do :) But this regex doesn't take some weird rules into account – escaped and unicode characters. Here's a good read about that: https://mathiasbynens.be/notes/css-escapes – tomekwi Oct 03 '14 at 12:55
  • What is the point of the whitespace and bracket selection at the end? `\s*\{`? – Jake Wilson Apr 16 '15 at 21:01
  • @Jakobud It was part of the question. See above. – Peter Apr 17 '15 at 10:19
2

This regex:

/(\w+)?(\s*>\s*)?(#\w+)?\s*(\.\w+)?\s*{/gm

will match any of the following:

p.my_class{}
p.thisclas45{}
.simple_class{}
tag#id.class{}
tag > #id{}

You can play around with it, on RegExr, here.

Paolo Stefan
  • 10,112
  • 5
  • 45
  • 64
  • @Tomalak do you mean the capture groups? Please take a look at the regexr link, the "replace tab" shows where they go: $1 is the tag, $2 the ancestor (>), $3 the id and $4 the class name (without the '{'). If you mean the full regexp, it is `/(\w+)?(\s*>\s*)?(#\w+)?\s*(\.\w+)?\s*{/gm` – Paolo Stefan Jun 13 '11 at 12:31
  • 1
    No, your delimiters, as I said. You didn't include them in your answer. Delimiters are _part_ of the expression, and enough people don't use them properly that leaving them out of the answer is dangerous. – Lightness Races in Orbit Jun 13 '11 at 13:02
  • but it will not match something like this: ```.icon-something:before { content: "\e935"; }``` – Sergiu May 05 '16 at 13:11
1

This regex will select all classes in a CSS file, no matter how much complex the CSS code is.

/(?<=\.)([a-zA-Z0-9_-]+)(?![^\{]*\})/g

Eg:

.class-1:focus > :is(button, a, div) > :first-child > .class2:first-child > .class_3 #id-1 + * { 
    padding: 8.3px;
    -webkit-box-align: center;
    color: #ff4834 !important;
}
@keyframes shimmer {
    0% {
        -webkit-transform: translateX(-100%);
        transform: translateX(-100%);
    }
    to {
        -webkit-transform: translateX(100%);
        transform: translateX(100%);
    }
}

Output:

['class-1', 'class2', 'class_3']
  • `g` is not an appropriate pattern modifier in PHP. The lookbehind can be replaced by `\.\K` -- this should largely improve performance because the regex engine won't need to look backward after every matching word or hyphen substring. The capture group is unneeded because it will be identical to the fullstring match. Your answer does not explain the reason for that negated lookahead subpattern. – mickmackusa May 09 '23 at 03:04
  • @mickmackusa is correct on every count, HOWEVER I picked up the regex between the two slashes and it worked great for a `Find in Files` in vscode across a very complex set of CSS files. – Arno Dverjging Jun 21 '23 at 20:37