I'm parsing text using Java. I've defined a grammar below:
Start := "(\\<)"
Stop := "(\\>)"
Var = "(\\w*)";
Cons = "([0-9]*)";
Type1 := Start ((Var | Cons) | TypeParent) (Type1 ((Var | Cons) | TypeParent))* Stop
Type2 := Start ((Var | Cons) | TypeParent) (Type2 ((Var | Cons) | TypeParent))* Stop
TypeParent := Type1 | Type2
...
etc
I want to combine all the regexes into a single String pattern and match all at once. My problem is when I start using the recursive grammar elements in the Type1
and Type2
lines. I obviously can't feed a recursive definition into a Pattern in Java - it's just a String with the regex symbols.
What I'd like is that I could somehow have a logical switch that said that if in this block:
(Type2 ((Var | Cons) | TypeParent)
all patterns were matched except Type2, that I could capture all the other groups, but then extract the string of characters where Type2 token should be and then recursively feed it into the regexer again. Eventually I'd get down to a base case of:
(Var | Cons) | TypeParent)
I realize that this isn't what regex is meant to do - this is now a context free grammar (?) since it is recursive. But short of thinking of a super clever parser, I think this method is hackable.
Thoughts?