0

I am not asking if there is greedy regex in sed, I already know that there is not. What I am asking is : It's known that sed is the best or one of the best stream editors that exists. So why the developers of this tool didn't implement the non greedy regex. It looks simple comparing to all the things this tool can do.

Sidahmed
  • 792
  • 1
  • 12
  • 23
  • 2
    FYI, "Why are things the way they are"-type questions generally don't do too well on here because the answers are mostly [subjective](http://stackoverflow.com/help/dont-ask). – 0x5453 Oct 18 '16 at 20:42
  • I just thought that some of the devs, or some one heard a similar answer from the devs can tell me why – Sidahmed Oct 18 '16 at 20:43
  • Non-greedy regexes were first made mainstream with PCRE. The specification for `sed` predates PCRE by decades. – Charles Duffy Oct 18 '16 at 20:43
  • @0x5453: The answer to this question isn't subjective at all. I don't see a reason to downvote this question or to close it. – Casimir et Hippolyte Oct 18 '16 at 20:46
  • @CasimiretHippolyte, despite having answered it, I'm personally not so certain that this is a good question. To get an answer that went beyond the standards document into individual implementations, we'd need to poll the folks behind major `sed` implementations for their reasoning (and to assess the intent of *potential* contributors who may have but did not submit patches). I believe that there's consensus that "only the implementor can answer" questions aren't well-suited to the format, though I'd need to poke around a bit more on meta to find references re: same. – Charles Duffy Oct 18 '16 at 20:54
  • @CharlesDuffy: I think you should quickly explain in your answer that there are several type of regex engines for different uses, and that non-greedy quantifiers are a particularity of backtracking engines. That's why I don't think the answer is subjective, It's due to a technology choice: the one has the feature, the other doesn't have it, implementor reasons aren't very important. – Casimir et Hippolyte Oct 18 '16 at 21:04
  • ...but implementor reason is exactly what the OP is explicitly requesting! – Charles Duffy Oct 18 '16 at 21:07
  • @CharlesDuffy: because he doesn't know that "non-greediness" isn't a simple feature that can be added to anything you want, there are underlying technical reasons. It's only the way he find, with his actual knowledge to formulate his question. – Casimir et Hippolyte Oct 18 '16 at 21:21
  • Possible duplicate of https://stackoverflow.com/questions/11856054/bash-easy-way-to-pass-a-raw-string-to-grep – tripleee Nov 07 '18 at 06:11

1 Answers1

7

History

Non-greedy matching is a feature of Perl-Compatible Regular Expressions. PCREs were only available as part of the Perl language until the 1997 implementation of libpcre, whereas the POSIX implementation of sed was first introduced in 1992 -- and the implementation of standard-C-library regular expression routines which it references predates even that, having been published in 1988.


Standards-Body Definitions

The POSIX specification for sed supports BRE; only BRE ("Basic Regular Expressions") and ERE ("Extended Regular Expressions") are specified in POSIX at all, and neither form contains non-greedy matching.

Thus, for PCRE support (or, otherwise, non-greedy matching support) to be standardized for inclusion in all sed implementations, it would first need to be standardized in the POSIX regular expression definition.

However, it's highly unlikely that this would occur in practice (except as an extension to be present, or not, at the implementor's option), given the practical reasons for which PCRE support can be undesirable; see the following section:


Implementation Considerations

  • sed is typically considered a "core tool", thus implemented with only minimal dependencies. Requiring libpcre in order to install sed thus makes libpcre a part of your operating system that needs to be included even in images where size is at a premium (initrd/initramfs images, etc).
  • Multiple strategies for implementing regular expressions are available. The historical (very high-performance) implementation compiles the expression into a nondeterministic finite automata which can be executed in O(n) time against a string of size n given a fixed regular expression. The libpcre implementation uses backtracking -- which permits for easier implementation of features such as non-greedy matching, lookahead, and lookbehind -- but can often have far-worse-than-linear performance).

See https://swtch.com/~rsc/regexp/regexp1.html for a discussion of the performance advantages of Thompson NFAs over backtracking implementations.

Charles Duffy
  • 280,126
  • 43
  • 390
  • 441
  • So, it is a problem of speed to ?? some greedy regex are slow ?? – Sidahmed Oct 18 '16 at 20:58
  • The easiest way to implement non-greedy regexes is with a backtracking algorithm, and backtracking is in some cases much, much slower than tradational implementations (ie. Thompson NFAs). It's possible to build a NFA implementation that allows backtracking, however; the article linked goes into some detail on that point. – Charles Duffy Oct 18 '16 at 21:03