1

I'm looking for a regular expression to find all instances of a CSS class name in HTML markup. So far I have this, assuming row is the class name that I'm looking for:

class=\"[a-zA-Z0-9\-_\s]*row[a-zA-Z0-9\-_\s]*\"

It correctly matches all of the following:

class="foo_bar bar row test"
class="row"
class="hello foo bar  row"
class=" foo bar  row test "

And correctly doesn't match this:

class="hello"  row

Unfortunately it incorrectly matches these (false positives):

class="narrow"
class="rowdy"

What regex will find a specific CSS class name in HTML?

Update There are lots of comments about how I shouldn't parse the DOM with regex. My use case is to do a 'find all' in a large project with thousands of HTML files to find where specific CSS classes are being used. I'm not operating inside of a browser or have access to a DOM.

Johnny Oshika
  • 54,741
  • 40
  • 181
  • 275
  • 2
    Just to be sure: do you *have* to use regex as opposed to a DOM parser here? If you have to, I'd say adding `\b` (word boundary) before and after `row` should do it, though I didn't really think this through so might be better ways. – Jeto Mar 26 '19 at 22:09
  • Try `class="(?:row|[^"]* row)(?![^" ])[^"]*"` if `_row_` is not allowed too. See live demo here https://regex101.com/r/Xq4sT9/1 – revo Mar 26 '19 at 22:14
  • Also what about `"hello a-row"`? – revo Mar 26 '19 at 22:19
  • Oh yeah a word boundary isn't enough because of dashes (at least). – Jeto Mar 26 '19 at 22:22
  • You forgot that `class = "` (notice the spaces) is also a legit syntax. And that a text `class="row` is also a legit text. Stop using regex to parse DOM. Use what browsers already use. A [**DOMParser**](https://developer.mozilla.org/en-US/docs/Web/API/DOMParser). [Tony the Pony he comes...](https://stackoverflow.com/a/1732454/383904) – Roko C. Buljan Mar 26 '19 at 22:31
  • @Jeto there's no access to a DOM. I'm doing a 'find all' in a big project with thousands of HTML files using a code editor (e.g. Atom, VSCode, Visual Studio, etc). – Johnny Oshika Mar 26 '19 at 23:31
  • @revo I'm looking for a specific class (e.g. `row`), so `a-row` should not match. – Johnny Oshika Mar 26 '19 at 23:31
  • @revo Your regex works. Please post an answer so I can accept it. Not sure how that works, but it works. :-) – Johnny Oshika Mar 26 '19 at 23:32
  • @RokoC.Buljan I don't have access to a DOM. I'm searching through thousands of HTML files with a text editor. – Johnny Oshika Mar 26 '19 at 23:37
  • It works because it only allows a space or a double quote around `row`. Please see the answer. – revo Mar 27 '19 at 08:48

2 Answers2

1

Try the below regex

(class\s?=\s?)\"([\d\w\s-])(\brow\b)([\d\w\s])\"

Tested all the cases you mentioned

https://regex101.com

Jobelle
  • 2,717
  • 1
  • 15
  • 26
  • Thanks, that's pretty good. It fails this test though: `class="flex-mt90 foo bar row" row`, but I realize that I didn't have it in my list of examples. https://regex101.com/r/jeos4r/2 – Johnny Oshika Mar 27 '19 at 18:52
1

You have to make boundaries but \b isn't enough since it matches the position between - and r in a-row which is expected but not intended. To define this boundary to only allow spaces or the position right after or before " of class attribute, you will need to write a pattern with two branches:

class="(?:row|[^"]* row)(?![^" ])[^"]*"

The above could be shorten to (but not preferred):

class="(?:[^"]* )?row(?![^" ])[^"]*"

Shorter but the same as longer one (talking performance-wise):

class="(?:[^"]* )??row(?: [^"]*)?"

Regex breakdown:

  • class=" Match class=" literally
  • (?: Start of non-capturing group
    • row Match row
    • | Or
    • [^"]* row Match row preceded by a space character
  • ) End of capturing group
  • (?![^" ]) The next immediate character should be space or "
  • [^"]*" Match up to and including "

See live demo here

revo
  • 47,783
  • 14
  • 74
  • 117
  • 1
    This is great and has saved me lots of time cleaning up CSS/HTML in a large project. – Johnny Oshika Mar 28 '19 at 16:56
  • Glad to hear. I just looked at the regex and realized it could be written shorter without affecting performance. So I added it. – revo Mar 28 '19 at 17:08