2

Most languages allow fixed-length or finite-length lookbehind. One notable exception is .NET, which allows the use of the * operator.

However, .NET regexs can already recognize balanced parentheses using named capture, which is not a regular language. Are regexs still regular with * in lookbehind? Extended answers for subexpressions other than * (for example, additional lookaround!) would also be appreciated.

tl;dr: Do regexs stay regular with * in lookbehind?

Zachary Vance
  • 752
  • 4
  • 18

3 Answers3

1

I believe the answer here: Does lookaround affect which languages can be matched by regular expressions? can be extended to prove that adding * in lookbehind (or even nesting such lookbehinds and lookaheads) does not affect the 'regularness' of the expressions. I haven't put more thought into it though.

Hope that helps!

Community
  • 1
  • 1
0

.NET's unbounded lookbehind is merely a refinement of an already non-regular feature: fixed, finite or infinite, lookbehinds have no place in a regular grammar. Nor do lookaheads, capturing groups, backreferences, reluctant quantifiers, possessive quantifiers, atomic groups, conditionals, word boundaries, anchors...

If we had to limit ourselves to theoretically-pure regular expressions, 99.9% of current regex users would have no use for them. Asking if a feature is "regular" is a waste of breath; is it useful? That's all that matters.

Alan Moore
  • 73,866
  • 12
  • 100
  • 156
  • -1 Many of the features you just listed (including lookaround) are perfectly regular. Also calling the question a waste of breath is a) rude and b) wrong as the regularness of a regular expression has practical implications: there are some things you can do with regular regular expressions, that you can't do with irregular regular expressions. For example: find their intersection. – sepp2k Jul 29 '10 at 09:45
  • 3
    These features are not in the traditional formulation of regular expressions, but they are still regular. For anyone that reads this question at some later time: Lookaheads (infinite-length), reluctant quantifiers, possessive quantifiers, atomic groups, word boundries, and anchors are regular. Capturing groups are essentially irrelevant. Backreferences, given capturing groups, are non-regular. – Zachary Vance Jul 29 '10 at 12:52
0

Regular expressions are closed under intersection. Add a new symbol & and rewrite the lookbehind: A(?<B)C as (?:AC&.*BC), and we get that lookbehind is regular.

B can include clearly use anything that doesn't go past the A/C boundry. That is, anything except lookahead. What happens if lookbehind may use lookahead, or vice-versa? Start work on .*BC . You're still fine.

So, regular expressions could really add in intersection and infinite-length lookaround (which can include more lookaround to any depth) and it would still be just as efficient.

Zachary Vance
  • 752
  • 4
  • 18