0

I'm parsing text using Java. I've defined a grammar below:

Start := "(\\<)"
Stop := "(\\>)"
Var = "(\\w*)";
Cons = "([0-9]*)";

Type1 := Start ((Var | Cons) | TypeParent) (Type1 ((Var | Cons) | TypeParent))* Stop
Type2 := Start ((Var | Cons) | TypeParent) (Type2 ((Var | Cons) | TypeParent))* Stop

TypeParent := Type1 | Type2

...
etc

I want to combine all the regexes into a single String pattern and match all at once. My problem is when I start using the recursive grammar elements in the Type1 and Type2 lines. I obviously can't feed a recursive definition into a Pattern in Java - it's just a String with the regex symbols.

What I'd like is that I could somehow have a logical switch that said that if in this block:

(Type2 ((Var | Cons) | TypeParent)

all patterns were matched except Type2, that I could capture all the other groups, but then extract the string of characters where Type2 token should be and then recursively feed it into the regexer again. Eventually I'd get down to a base case of:

(Var | Cons) | TypeParent)

I realize that this isn't what regex is meant to do - this is now a context free grammar (?) since it is recursive. But short of thinking of a super clever parser, I think this method is hackable.

Thoughts?

lollercoaster
  • 15,969
  • 35
  • 115
  • 173

2 Answers2

5

You are correct. This isn't what regular expressions are meant to do. The instant you introduce recursion you are out of the land of regular expressions, DFAs, and into the land of context-free languages, DPDAs, parsers. You need a stack to handle recursion. And no, it isn't hackable.

There's nothing 'super-clever' about the parser for this grammar. It is considerably simpler than what you've done so far.

user207421
  • 305,947
  • 44
  • 307
  • 483
  • @lollercoaster A grammar with three productions isn't complex. You don't need a state machine at all. You can do it in recursive descent. I'd advise you to stop guessing and start learning what these techniques really mean. – user207421 Oct 05 '12 at 22:22
  • you are correct. recursive descent was pretty easy. implementation with the shunting yard algorithm -> RPN -> AST tree did exactly the trick. thanks! – lollercoaster Oct 06 '12 at 18:10
3

It's so much easier to use the right tool for the job. Try CUP, ANTLR, or JavaCC. Here is an ANTLR example.

Community
  • 1
  • 1
Joe Coder
  • 4,498
  • 31
  • 41