You ask several questions. The answer to one has appeared before in SO. Non-matching parens are not provided by sed
.
To answer another, you need only read re2
's documentation closely. The mechanism used by branch-and-bound matchers like sed
's (and also Perl
, Python
, Java
, etc.) and the Deterministic Finite Automaton (DFA)-based matchers like re2
are intrinsically different. There is no cut
operation in sed
's recognizer that will do what you want.
Having said that, the re2
documentation omits its negatives. Compiling a DFA is much more work than converting a regex to the bytecode used internally by e.g. Perl
's matcher. Thus Perl programs are not slowed down by regex compilation. In fact the re2
compiler can "blow up" on certain short regexes and produce a DFA of size exponential in the size of the regex. Thus the compiler takes exponential time to run, and the re2
technique moves the bad behavior from runtime to compilation.
I do agree with the re2
guys that in general it's much better to have the bad behavior depend on the regex than the input. It would be a better, safer world if all the embedded regex recognizers in programming languages used the re2
approach.
Finally, your questions seem to conflate size and run time. The DOS from a backtracking recognizer occurs because certain short inputs require exponential time in the input length to recognize (or reject). Since the regex can never capture more than the (short) input length, capturing or lack of capturing has no effect.
The other kind of DOS you may be thinking of is a user providing a monster input that the sed
recognizer is forced to store internally because it has only noncapturing groups, even though the capture is never used. This is certainly a way to make a problem for at least some implementations of sed
(hypothetically a smart implementation could determine that the capture is not needed and skip it; I don't think the GNU code does that), but it only happens if you allow huge inputs, which can normally be prevented by other means. Why are the no non-capturing groups in sed
? It is historically a very old program going back to some of the very first Unix machines. People didn't worry about DOS in those days.