1

I'm working on a tool for manipulating context-free languages, and the internal representation of a grammar is stored as a finite automaton. Looking farther into EBNF and RegEx, I learned that EBNF has "exceptions" and RegEx has negative lookahead. I can see how these might be modeled by a symmetric difference NFA, but had suspected they are beyond the capabilities of a regular DFA or NFA.

But then I came across this which pretty plainly suggests that I was wrong. Can anyone point to a free resource that might show how to convert an EBNF with exceptions, or a RegEx with negative lookahead, into a DFA?

Brent
  • 4,153
  • 4
  • 30
  • 63

1 Answers1

4

You can replace a negative lookahead with a positive lookahead on the full set of possible matches minus the negative lookahead match. E.g. if we were working with single characters a-z as the match space, /what(?!n)/ is the same as /what(?=[a-mo-z]|$)/ (the $ is needed as end-of-string is one of the possible matches). This is OK in theory but not so great in practice when dealing with larger matches, like /afraid of (?!chinese)/.

https://cs.stackexchange.com/questions/2557/how-to-simulate-backreferences-lookaheads-and-lookbehinds-in-finite-state-auto gives a good treatment of converting lookaheads into something DFA-like, with an important caveat at the end.

Allen Luce
  • 7,859
  • 3
  • 40
  • 53
  • `/what(?=[a-mo-z])/` is not the same as `/what(?!n)/` because it doesn't match the string "what". You have to always anticipate the end of the string, so it has to be `/what(?=[a-mo-z]|$)/` where `$` matches the end of the string. – Michael Schmidt Mar 08 '20 at 12:22
  • Good catch! Edited. – Allen Luce Mar 09 '20 at 02:28