0

I currently use /^<!--\n((.|\n)*)\n-->/ but all lines between first <!-- and last --> are captured.

<!--
Title: Foo
-->

# This is a test

<!--
Title: Bar
-->
sunknudsen
  • 6,356
  • 3
  • 39
  • 76
  • 3
    Check about greedy vs non-greedy modifiers. – Al.G. Sep 09 '20 at 18:26
  • 3
    @Al.G.: Quantifiers. – Casimir et Hippolyte Sep 09 '20 at 18:30
  • 1
    @CasimiretHippolyte will look it up thanks! In the meantime, do you mind suggesting an answer? Spent a lot of time trying to figure this out. – sunknudsen Sep 09 '20 at 19:01
  • @sunknudsen: instead of `(.|\n)*?` that is perfectly correct but not very performant with regex engines like the Javascript one, use `[\s\S]*?` (a character class with all white-space characters `\s` and all characters that aren't white-spaces `\S`, in other words: all characters including newline). – Casimir et Hippolyte Sep 09 '20 at 19:26

2 Answers2

3

You're missing the lazy ? quantifier, as the comments pointed out. That's all.

const regex = /^<!--\n((.|\n)*?)\n-->/gm;
const string = `<!--
Title: Foo
-->

# This is a test

<!--
Title: Bar
-->`;
const matches = string.match(regex);
console.log(matches);
GirkovArpa
  • 4,427
  • 4
  • 14
  • 43
0

You use a greedy quantifier (quantifiers are by default greedy) instead of a lazy/non-greedy/reluctant quantifier. Many posts are related to this problem but this isn't the only problem of your pattern.

You use (.|\n)* to spread over multiple lines: that is correct but not efficient for backtracking regex engines (you can use that with grep or sed but this is better to avoid it with Javascript/PHP/Python/Ruby...). A way consists of using [\s\S]*, but in your particular case you can also use a more descriptive subpattern, since the starting tag is followed by a newline and the closing tag is preceded with a newline.

To reduce the number of tests needed by something like \n[\s\s]*?\n--> (that tests for each character taken by [\s\S]*? if \n--> follows), you can replace [\s\S]*? with .*(?:\n.*)*?. This time, \n--> is only tested once per line instead of once per character.

The pattern becomes /^<!--\n(.*(?:\n.*)*?)\n-->/gm.

But there's always a problem: What about empty comments since there's 2 mandatory newlines in this pattern? (an empty comment has only one newline.)

You can use: ^<!--\n(.*(?:\n.*)*?)\n?^-->/gm demo

This pattern makes the last \n optional to allow to capture the empty string with a simple alternation. On the other hand, to ensure that this one is present when the comment isn't empty, I added the anchor ^.

Casimir et Hippolyte
  • 88,009
  • 5
  • 94
  • 125