0

I am working on a regexp matcher to find a string in a substring. This string happens to be, in my use-case, a class name that comes after the html attribute class=. While it may appear a duplicate of the historic answer in the question "regex-match-open-tags-except-xhtml-self-contained-tags", I think it's more like "finding instance of a string inside a larger string", and is what regular expressions can help with, in my understanding...but then again, here I am at Stack Overflow.

I thought word boundaries might do what I want, as it works with ruby gsub to find the occurences of my search string within a larger string, but it isn't returning the correct results when using a regexp matcher. I think I am reaching the limits of word boundary anyways, when trying to match text that follows an occurence of class=.

I'm wondering if there's other things in the regexp toolbox to use to achieve the following:

When searching for string "text-small" I want to one occurrence to be returned for the following cases:

  • class="text-small"
  • class="other-class text-small"
  • class="text-small other class"

So we can notice a few requirements: match a space before, or match a space after the string, or match a " before or after the string, and the string needs to be preceded at some point by a class=.

We do not want to match class="text-small-v2" and do not want to match random occurrences of "text-small" without being inside a class=

An example:

So <p class="text-small some-other-class">Hey here is some text-small too</p><span class="text-small-v2">hello there</span> should result in 1 occurrence.

Here's the rubular permalink using @bobblebubble 's suggestion, which seems to work well: https://rubular.com/r/yXhEnbxp6OKtZk

Thanks for any and all help!

Danny
  • 3,982
  • 1
  • 34
  • 42
  • 1
    Please mention lang and what's the purpose. E.g. [`\bclass\s*=\s*"(?:[^"]*\s)?text-small[\s"]`](https://regex101.com/r/mb6GNu/1) (without using lookarounds) – bobble bubble Nov 16 '22 at 02:31
  • Thanks @bobblebubble that seems to match the correct text in my string. While the question has been marked a duplicate, I don't think finding a substring in a string using regex is the same as what is being discussed in that linked answer...if you are able to add your answer as a proper answer I'll mark it accepted. Appreciate the help! – Danny Nov 16 '22 at 15:37
  • 1
    Glad it helped! As soon as adding answers to questions that deal with parsing HTML by regex they are sadly heavily downovoted here (presumably due to lack of understanding). It always depends on the goal imho and if it's one's own html and not arbitrary. The regex I wrote was thought **only for finding** matches, it will certainly break any valid html if doing replacements with it and without proper captures because it matches whitespace or double quote (breaks it). – bobble bubble Nov 16 '22 at 15:41
  • 1
    That all sounds reasonable to me @bobblebubble, thanks again. I explained more in the question about the intentions and went deeper to describe the requirements I need in this use case. My next task when I find a match is to use `Nokogiri::HTML` along with ruby `gsub` to find and replace the css class with another class. So regexp says "here is a match" and then I'll use another tool to ensure valid html. – Danny Nov 16 '22 at 15:48

0 Answers0