0

I'm looking for a regex to find HTML comments if the comment is not within a code block (represented by three backticks ```).

So far, I can prevent finding HTML comments if the backticks are in the same line, but I can't find a way to avoid it if there are on the line before.

My regex now: ^((```){0})\s*<!--[\s\S]*?(?:-->)

Some of my tests, the first three should work, not the last three (I can't write them because of the formatting with the backticks)

enter image description here

acoudouy
  • 160
  • 1
  • 4
  • How are you doing this? Language, tool... – bobble bubble Dec 13 '22 at 09:04
  • It's pretty tricky to solve this with regex only eg. you'll have hard times matching the opening code tag `\`\`\`` without matching the closing ones (how to detect if code block opening or closing as it's the same tag?). A simpler solution would be to split the string sequentially to find the closing tag that corresponds to an opening – Kaddath Dec 13 '22 at 09:09
  • 1
    You could [match what you don't want but *capture* what you need](https://www.rexegg.com/regex-best-trick.html#thetrick) [(regex101 demo)](https://regex101.com/r/6vD5IX/1). – bobble bubble Dec 13 '22 at 09:16
  • 1
    @bobblebubble that's a pretty clever "hack", like this you ensure all opening tags are consumed with their closing ones and don't have problems with lookbehinds or lookaheads – Kaddath Dec 13 '22 at 09:37
  • @Kaddath Yes, it's very useful and simple. In PCRE it can even be combined with verbs [`(*SKIP)(*F)`](https://stackoverflow.com/questions/24534782/how-do-skip-or-f-work-on-regex) [like this demo](https://regex101.com/r/6vD5IX/3) by skipping what's on the left side of the alternation and matching the right side. – bobble bubble Dec 13 '22 at 09:50
  • How about something like this, `/((?<=\`\`\`).*?(?=\`\`\`)|(?=))/gs`, it's not perfect as it doesn't capture the \`\`\` but after you match the comments and remove them (if you remove them) then you can match \`\`\`\`\`\` – Buttered_Toast Dec 13 '22 at 10:08
  • Thank you everyone for your answers. @bobble, I'm working with node.js – acoudouy Dec 14 '22 at 09:00
  • @acoudouy Is the goal to extract these comments outside? Or doing replacements... – bobble bubble Dec 14 '22 at 09:01
  • While testing your answers Buttered_Toast, I can't make it work, unfortunately. I think I'll need to work with a js function instead of a plain regex..! bobble, for now, the goal is to delete the comment if it is not within the code blocks – acoudouy Dec 14 '22 at 09:03
  • 1
    @acoudouy Have a look [at this JS demo](https://tio.run/##bY/BDoIwEETv/YrxBBgQvYrCR3gUkyKuWlMs6VZP/ju2QQ8atoedTCevnVvzbLi1qnfZ3ZxoGDQ5sLPYQpqHw2aWZXDEDl6UAjjqBhMXZ2OmbNF6KK5kCVmJWobzGxk9MS58ZjLyyYi/J0ZXFkLkOdQZ3QqKweRSKBdx@BWrE4lQzBL7Yr7ewlKvm5biXEq5r7neHeaVl6840L@Gxyf5JUXcLVPPTbAtA75CFGGNbpkUoeCdjaaFNpfY45NiGN4) (just a quick attempt to illustrate the idea) – bobble bubble Dec 14 '22 at 09:11

0 Answers0