0

Why is:

[\s\S]+?

Much more efficient than:

(?:.|\n)+?

What are the differences between the two in terms of how they work behind the scenes?

Note: this is with DOTALL turned off. Also, from https://www.regular-expressions.info/dot.html:

JavaScript and VBScript do not have an option to make the dot match line break characters. In those languages, you can use a character class such as [\s\S] to match any character. This character matches a character that is either a whitespace character (including line break characters), or a character that is not a whitespace character. Since all characters are either whitespace or non-whitespace, this character class matches any character.

David542
  • 104,438
  • 178
  • 489
  • 842
  • First version is single step, whereas second version is 3 steps, according to regex101 – Seblor Dec 13 '19 at 19:40
  • Also see https://stackoverflow.com/questions/4724588/using-alternation-or-character-class-for-single-character-matching?rq=1 – ctwheels Dec 13 '19 at 19:49
  • Those are not equivalent expressions. I think you need `(?:.|\r?\n)+?` – MonkeyZeus Dec 13 '19 at 19:53
  • 1
    Never use alternation when you can use a character set instead. Alternation causes the engine to split into matching two subpatterns, which causes extra overhead due to backtracking. A character set simply looks for one of the characters and moves on. Also note that `[\s\S]+?` is not functionally synonymous with `(?:.|\n)+?` A set containing a class and its negation such as `[\s\S]` will match *anything* that isn't zero-length. `.|\n` is more specific, and excludes non-\n newline characters such as `\r`. – CAustin Dec 13 '19 at 19:54
  • 1
    https://stackoverflow.com/q/4724588/2191572 is also relevant – MonkeyZeus Dec 13 '19 at 19:57
  • [Here is a benchmark](https://stackoverflow.com/a/4724840/372239) – Toto Dec 14 '19 at 10:23

0 Answers0