1

i'm implementing a LL(1) parser for a project of doing a shell implementation. i'm stuck trying to resolve conflicts in my grammar :

Parsing mode: LL(1).

Grammar:

     1. COMMAND_LINE -> COMPLETE_COMMAND PIPED_CMD
     2. PIPED_CMD -> PIPE COMPLETE_COMMAND PIPED_CMD
     3.            | ε
     4. COMPLETE_COMMAND -> CMD_PREFIX CMD CMD_SUFFIX
     5. CMD_PREFIX -> REDIRECTION CMD_PREFIX
     6.             | ε
     7. CMD_SUFFIX -> REDIRECTION CMD_SUFFIX
     8.             | CMD_ARG CMD_SUFFIX
     9.             | ε
    10. REDIRECTION -> REDIRECTION_OP WORD
    11.              | ε
    12. CMD -> WORD
    13. CMD_ARG -> WORD CMD_ARG
    14.          | SINGLE_QUOTE WORD DOUBLE_QUOTE CMD_ARG
    15.          | DOUBLE_QUOTE WORD DOUBLE_QUOTE CMD_ARG
    16.          | ε
    17. REDIRECTION_OP -> HERE_DOC
    18.                 | APPEND
    19.                 | INFILE
    20.                 | OUTFILE

i use syntax-cli to check my grammar, and the ll(1) parser is a home made implementation, i can link my implementation of the parser if needed. the conflict detected by syntax-cli are :

PIPE WORD SINGLE_QUOTE DOUBLE_QUOTE HERE_DOC APPEND INFILE OUTFILE $
CMD_SUFFIX 9 7/8 7/8 7/8 7/8 7/8 7/8 7/8 9
REDIRECTION 11 11 11 11 10/11 10/11 10/11 10/11 11
CMD_ARG 16 13/16 14/16 15/16 16 16 16 16 16

i've also tried this grammar :


COMMAND_LINE     
                 : COMPLETE_COMMAND PIPED_CMD
                 ;
PIPED_CMD        
                 : PIPE COMPLETE_COMMAND PIPED_CMD
                 |
                 ;
COMPLETE_COMMAND 
                 : REDIRECTION CMD REDIRECTION CMD_ARG REDIRECTION
                 ;
REDIRECTION      
                 : REDIRECTION_OP WORD
                 | 
                 ;

CMD              
                 : WORD
                 ;
CMD_ARG          
                 : WORD REDIRECTION CMD_ARG
                 | SINGLE_QUOTE WORD DOUBLE_QUOTE REDIRECTION CMD_ARG
                 | DOUBLE_QUOTE WORD DOUBLE_QUOTE REDIRECTION CMD_ARG
                 | REDIRECTION
                 ;
REDIRECTION_OP
                 : HERE_DOC
                 | APPEND
                 | INFILE
                 | OUTFILE
                 ;

but the parser don't work when using multiple redirections ...

tstanisl
  • 13,520
  • 2
  • 25
  • 40
Corecaps
  • 23
  • 6
  • There is certainly an error message, a line number, etc. Besides, what parser generator are you using? Yacc, Bison, another one? – chrslg Dec 16 '22 at 12:37
  • 1
    i added the conflicts detected by syntax-cli, i'm not generating the parser but writting it myself and use parser generator to validate that my grammar is LL(1) compatible. – Corecaps Dec 16 '22 at 12:51

1 Answers1

1

Without more specification on your behalf, can't be sure to have it all. But indeed, this grammar is ambiguous.

To build a LL(1) analyzer, you must be able to say, for any combination of symbol on the analyzer stack (symbol being either a terminal or non-terminal yet to read) and any word from the input buffer, what rule should apply.

Put yourself in the situation where you code starts with a WORD (that is first thing that is in input buffer)

You start by trying to analyze COMMAND_LINE

If input buffer starts with WORD, then only one rule can lead to COMMAND_LINE, that is the rule COMPLETE_COMMAND PIPED_CMD (anyway, whatever input, there is only this rule. Either we can apply it, or it is a syntax error. But for now, no reason to raise a syntax error, this rule is compatible with a start with WORD).

So, now, on your stack you have COMPLETE_COMMAND PIPED_CMD, and in input buffer, still the same WORD.

The only possible rule for the top of the stack is COMPLETE_COMMAND -> CMD_PREFIX CMD CMD_SUFFIX

So, now, on your stack you have CMD_PREFIX CMD CMD_SUFFIX PIPED_CMD.

And waiting in input buffer WORD

2 rules can be applied from CMD_PREFIX :
CMD_PREFIX -> REDIRECTION CMD_PREFIX
or CMD_PREFIX -> ε

None of them can start with WORD. So either we say that what we have here is an empty CMD_PREFIX (followed by a CMD starting with WORD)

Or we can see it as a REDIRECTION followed by an empty prefix. REDIRECTION can be REDIRECTION -> ε

So both are possible at this point. Either we have a CMD_PREFIX(ε) or we have a CMD_PREFIX(REDIRECTION(ε), ε) (or even more recursions).

For the grammar to be LL(1), we should not have to go deeper to decide. From this point, with the only knowledge that next lexem is WORD, we should be able to choose among those too. We aren't.

(In fact, even with other grammar than LL(1), we couldn't decide)

chrslg
  • 9,023
  • 5
  • 17
  • 31
  • thank you, the grammar was indeed ambiguous, i resolved it like this : https://pastecode.io/s/m4if77wg thanks for the help – Corecaps Dec 16 '22 at 13:04
  • 1
    Yep, seems the right thing to do indeed. Since the two places where "REDIRECTION" is allow also `ε`, no need for redirection to be possibly `ε`. And the otherway round (allowing redirection to be ε, but then saying that `CMD_PREFIX` or `CMD_SUFFIX` can't) would make it hard to have the chain-list of prefix and suffix. – chrslg Dec 16 '22 at 13:12
  • 1
    Whatever parser generator (even when using it as a parser verificator ;-)), debugging grammar is quite painful, and I often end up doing what I did in my answer: tracing myself. Messages are not helping much, because often a single error (as this one) lead to numerous ambiguities – chrslg Dec 16 '22 at 13:15
  • for example, here, I read from the table (surmising a bit how it should be read) that if you end up trying to analyze a `REDIRECTION` (that is on top of the stack) with a coming word being a `HEREDOC`, then you can apply both indifferently rule 10 and 11. That is either the redirection is that heredoc. Or the redirection is empty (and that is not a syntax error, because the heredoc can then be part of another redirection in the next prefix as possible with rule 5). And that is because of the same error. – chrslg Dec 16 '22 at 13:20
  • And that one, still is not that bad. At least one of the rules involved (11) is the one you needed to remove. But the same error implies that sometimes, analyzing a suffix, when a word is coming, you have no way to decide between rule 7 and 8. And neither rule 7 nor 8 is the real culprit. – chrslg Dec 16 '22 at 13:21
  • So, it often ends as this time: you understand the error message, once you found the error by yourself :D – chrslg Dec 16 '22 at 13:21