Can a RegEx with negative lookahead be represented as a finite automaton?

Question

I'm working on a tool for manipulating context-free languages, and the internal representation of a grammar is stored as a finite automaton. Looking farther into EBNF and RegEx, I learned that EBNF has "exceptions" and RegEx has negative lookahead. I can see how these might be modeled by a symmetric difference NFA, but had suspected they are beyond the capabilities of a regular DFA or NFA.

But then I came across this which pretty plainly suggests that I was wrong. Can anyone point to a free resource that might show how to convert an EBNF with exceptions, or a RegEx with negative lookahead, into a DFA?

Allen Luce · Accepted Answer · 2020-03-09T02:28:25.240

4

You can replace a negative lookahead with a positive lookahead on the full set of possible matches minus the negative lookahead match. E.g. if we were working with single characters a-z as the match space, /what(?!n)/ is the same as /what(?=[a-mo-z]|$)/ (the $ is needed as end-of-string is one of the possible matches). This is OK in theory but not so great in practice when dealing with larger matches, like /afraid of (?!chinese)/.

https://cs.stackexchange.com/questions/2557/how-to-simulate-backreferences-lookaheads-and-lookbehinds-in-finite-state-auto gives a good treatment of converting lookaheads into something DFA-like, with an important caveat at the end.

edited Mar 09 '20 at 02:28

answered Nov 28 '14 at 20:34

Allen Luce

7,859
3
40
53

`/what(?=[a-mo-z])/` is not the same as `/what(?!n)/` because it doesn't match the string "what". You have to always anticipate the end of the string, so it has to be `/what(?=[a-mo-z]|$)/` where `$` matches the end of the string. – Michael Schmidt Mar 08 '20 at 12:22
Good catch! Edited. – Allen Luce Mar 09 '20 at 02:28

Can a RegEx with negative lookahead be represented as a finite automaton?

1 Answers1