1

I have been looking for quite some time to be able to match some class names starting with a specified pattern in an HTML String. Here is the regex I finally came up with:

/(?<=(<[^>]*class=("|')(\w|\ |\-)*))((?<= |\'|\")(foo-|-foo)[^ "']*)/gmi

The sample I have been working with is:

<body>
<div style="margin-left:6px;" class="foo-pink blfoo-pin-foo-kue red yellow bar-green -moz-FF foo-pink moz-FF foo-pink" >
    <fieldset class="foo customClass foo clFieldsBar  bar-try" id="idField foo- bar-dfgdgdfg">         
         <legend><span>Qu'en pensez-vous ?</span></legend>
        < id="idText" class='foo- Comment_text fdgdgdfg -foo-ddede  mso-whitespace' name="nameText barName"></textarea> bar-deded foo-green
    </fieldset>
    class="blue dffsf sdf mso-green foo"
</div>  

You can see the following RegEx doing what I actually want here: https://regex101.com/r/6W5AUT/4

The problem is that I need the regex to be executed in a Delphi code. However, when I do so, i get the following error:

lookbehind assertion is not fixed length

Which, after some quick research (of the reason behind the error), lead me to discover that I cannot use a negative lookbehind with a variable length.

I have been trying to transform my RegEx using different methods (\K to reset the match for example), and this is what I came up with so far:

/(<[^>]*class=("|'))(\b(foo)(\w|\-)*)*/gmi

You can see it working here: https://regex101.com/r/zeQDrK/2

As you can see, it is only matching the first class name of each class attribute in a tag.

Now to be precise about what I need:

  • Match all occurences of a class name that start with a pattern (it can be "foo", "-foo" or a combination of both),
  • It needs to match only the class names that are in an html tag (this is why you can see the class="blue dffsf sdf mso-green foo" outside of an HTML tag),
  • It needs to support both class="ClassName1 ClassName2" or class='ClassName1 ClassName2'

I would appreciate any help to solve this problem. Thanks for your time.

David Heffernan
  • 601,492
  • 42
  • 1,072
  • 1,490
VLGBas
  • 11
  • 1
  • 3
    Here is an [interesting reading](https://stackoverflow.com/a/1732454/8041231). – Victoria Jun 26 '18 at 10:07
  • Indeed, I am aware of the problem. I am open to any solution (including parsing my HTML as XML, I'll dig into that) , but how come I am able, with the first regex I specified, to perform exactly what I want. I think the fact that I can do it in a way make the other way (without using lookbehind) possible. Anyway, thanks for the link it was actually helpful to understand why I have been having such a hard time. – VLGBas Jun 26 '18 at 10:16
  • [`(?:<[^>]*?class=(?:["'])|\G(?!^))(?:[\w -])*?\K((?<=[ '"])(foo-|-foo)[^ "']*)`](https://regex101.com/r/zeQDrK/3) is a PCRE way to express your original RegEx. I am pretty sure there is some existing html code that will break it. – Sebastian Proske Jun 26 '18 at 10:59
  • I have been testing your regex, and it seems to work for what I need, for other people stuck on a similar problem, and for my personal knowledge, could you please provide an explanatiion of the method or way you did the change? do you have any documentation I could read to improve myself when dealing with such cases?. Thanks a lot anyway. – VLGBas Jun 26 '18 at 12:04

0 Answers0