Parse lambda calculus style function applications with LL1 parser

Question

I'm using TinyPG, which is an LL1 parser generator, to parse lambda calculus. I'm trying to write a rule that will parse function application like (a b) or (a b c) and so on.

So far I have this rule (a bit simplified):

APPLICATION -> LPARENTHESES VARIABLE (SPACE+ VARIABLE)+ RPARENTHESES;

But that'd fail to parse a term that has spaces after the left and before the right brackets: ( a b ). I can allow spaces after the opening bracket like this:

APPLICATION -> LPARENTHESES SPACE* VARIABLE (SPACE+ VARIABLE)+ RPARENTHESES;

But I'm having trouble setting it to allow spaces before the closing bracket. I came up with this, which seems to work:

ARG_LIST        -> (RPARENTHESES | (SPACE+ (RPARENTHESES | (VARIABLE ARG_LIST))));
APPLICATION     -> LPARENTHESES SPACE* VARIABLE ARG_LIST;

But is messy and recursive, which will then make it hard to read and compile the nodes. Is there any non recursive or at least simpler way to parse this?

Did you look at the whitespace handling in the tutorial? (Search for `[Skip]`) — rici, May 13 '15 at 01:57
Yes, but that'll skip every whitespace, including the ones between each argument which shouldn't be skipped. — Juan, May 13 '15 at 01:59
Why shouldn't they be skipped? Skipped does not mean "deleted": it means "recognized and then ignored". The VARIABLE token will still end when a whitespace character is encountered. Try it and see. — rici, May 13 '15 at 02:03
I just tried it. It works only when I define the skipping space after I define the non skipping space (the one I'd use between arguments). I cannot use the same space both for skipping and between arguments. Otherwise it'll just fail to parse it. I'll do it this way if there's no other way, it just feels a bit hacky. — Juan, May 13 '15 at 02:09
Just leave out the non-skipping space. You don't need to tell the parser that arguments need to have spaces between them. — rici, May 13 '15 at 02:11
Oh I see! Feel free to post it as an answer and I'll accept it. Although I still wonder if there was a way to do it without the Skip thing. — Juan, May 13 '15 at 02:12
I suppose you could try blitzing your grammar with `WHITESPACE?`, but that would be distracting and inefficient, and it doesn't add anything to the readability of the grammar. Also, it would almost certainly leave you with a grammar which was no longer LL(1). Skipping whitespace is not a hack. Honest. — rici, May 13 '15 at 02:22

score 1 · Accepted Answer · answered May 13 '15 at 02:19

There is no reason to confuse the parser with whitespace. It is sufficient for it to be ignored in the scanner using the [Skip] attribute as shown in the tutorial:

[Skip] WHITESPACE -> @"\s+";

"Skip" does not mean "delete". It means that the scanner should recognize the token and then ignore it. If you skip whitespace, whitespace will still separate alphanumeric tokens just fine. You just don't need to include the space token in your grammar, leaving you with:

APPLICATION -> LPARENTHESES VARIABLE VARIABLE+ RPARENTHESES;

(Actually, empty application lists are usually allowed, so I'd write that with a * instead of a +. But it's your language.)

Parse lambda calculus style function applications with LL1 parser

1 Answers1