NEVER use (.|\n)*?
. It is a very misfortunate, widely known pattern, that causes so much backtracking that it often leads to situations like this, when the text is long and specific enough to lead to catastrophic backtracking.
Note that even [\w\W]*?
(or [\s\S\r]*?
, see Multi-line regular expressions in Visual Studio Code) here might already suffice. Although it also involves quite a lot of backtracking, it will be much more efficient.
What can usually be used is an unrolled pattern, like
<tag>(?:[^<\r]*(?:<(?!/tag>)[^<]*)*</tag>){4}
Instead of (.|\n)*?
, a series of patterns are used so that each could only match distinct positions in a string.
Details
<tag>
- a literal string
(?:[^<\r]*(?:<(?!/tag>)[^<]*)*</tag>){4}
- four repetitions of
[^<\r]*
- 0 or more chars other than <
(even line break chars, \r
ensures this in VS Code regex, it enables all character classes in the pattern that can match newlines to match newlines (thus, \r
is not necessary to use in the next character class))
(?:<(?!/tag>)[^<]*)*
- 0 or more repetitions of a <
not followed with /tag>
and then 0 or more chars other than <
.
</tag>
- a literal </tag>
string.
Having said that, you might also be interested in the Emmet:Balace outward
:
A well-known tag balancing: searches for tag or tag's content bounds
from current caret position and selects it. It will expand (outward
balancing) or shrink (inward balancing) selection when called multiple
times. Not every editor supports both inward and outward balancing due
of some implementation issues, most editors have outward balancing
only.
Emmet’s tag balancing is quite unique. Unlike other implementation,
this one will search tag bounds from caret’s position, not the start
of the document. It means you can use tag balancer even in non-HTML
documents.