1

I have an html tag as a string. If 'md' is included in the class attribute in this tag, I want to select it and get the expression in the tag.

Example:

'<tag class="...blah md blah...">(expression)</tag>

  • <tag></tag> is first selector
  • Class attribute included md is second selector
  • At the same time, the tag should not be empty.

I mean, I need a regex that starts with <tag> and ends with </tag> and gives tag with md in class attribute, but I couldn't get out of it.

What I did is trying to select those with direct md attribute, but this is wrong. Also problem with nested tag ones as well.

(<b md(?!<|>).+>|<b \S+ md>|<b md>|<b .+ md .+>)(.+)(<\/b>)

https://regex101.com/r/3Vv0WG/1

I decided that the correct form is in the class attribute, but I could not write this regex. Thanks for your help.

Example:

  • '<b class="... md ..."></b>' not match because tag is empty
  • '<i class="..."></i>' not match because class attribute not include md
  • <span class="...md...">ANYTHING</span> match

It would be more appropriate not to be nested because it causes chaos in the code.

Andy Lester
  • 91,102
  • 13
  • 100
  • 152
RıdvanÖnal
  • 159
  • 1
  • 2
  • 8
  • 1
    Why not use the dom and javascript with `classList.contains()` or a domparser? – The fourth bird May 29 '21 at 10:54
  • This is in a replacement loop and I have a string, not an html element, I need to replace the text that conforms to this condition so I have to check the class with regex as it is not a dom element, and I also need to check its tag. – RıdvanÖnal May 29 '21 at 10:59
  • 1
    Can there be nested elements? – The fourth bird May 29 '21 at 11:01
  • May depend on the condition :) – RıdvanÖnal May 29 '21 at 11:04
  • 1
    @RıdvanOnal 4th bird asks probably, because it's a whole lot different, if there can occur nested tags e.g. `abcde`. Maybe you could clarify your question a bit, with scenarios, input, expected outcome, what tool you're using... – bobble bubble May 29 '21 at 11:11
  • 1
    Exactly :-) You might get away with https://regex101.com/r/Girb1Y/1 but this can easily break, and reading the string with a [DOMParser](https://developer.mozilla.org/en-US/docs/Web/API/DOMParser) might still be a better option. – The fourth bird May 29 '21 at 11:13
  • It would be more appropriate not to be nested because it causes chaos in the code. – RıdvanÖnal May 29 '21 at 11:19
  • https://regex101.com/r/Girb1Y/1 This was exactly the regex I was looking for. I will add this as an answer to the question. If you post it as an answer I can confirm it. @Thefourthbird – RıdvanÖnal May 29 '21 at 11:26
  • The question has been updated and I think is pretty clear now. – The fourth bird Jun 02 '21 at 21:21

1 Answers1

1

If you have no parser or dom available and can only get the parts from the string with a pattern, you might get away with:

<(\w+) [^<>]*\bclass\s*=\s*"[^"]*\bmd\b[^"]*"[^<>]*>[^<>]+<\/\1>

Regex demo

Notes

  • [^ Means a negated character class matching any char except what is listed
  • (\w+) captures 1+ word chars in group 1, and \1 is a backreference to match the same as group 1
  • The pattern assumes that for the ANYTHING parts there are no chars < or >
  • The md is matched between word boundaries, preventing a partial match with another "word"

» Food for thought, read about tony the pony.

The fourth bird
  • 154,723
  • 16
  • 55
  • 70