0

I'm following the tiger book to write a compiler.

In chapter 3, based on the github's code and my understanding, I filled in the following rules for the dec:

decs:
    %empty
    | decs dec
;

dec:
    tydec
    | vardec
    | fundec
;

tydec: 
    TYPE ID '=' ty
;

vardec:
    VAR ID ASSIGN exp
    | VAR ID ':' ID ASSIGN exp
;

fundec:
    FUNCTION ID '(' tyfields ')' '=' exp
    | FUNCTION ID '(' tyfields ')' ':' ID '=' exp

However, in chap 4, the book provided the following functions for ast:

A_fundecList A_FundecList(A_fundec head, A_fundecList tail);
A_nametyList A_NametyList(A_namety head, A_nametyList tail);

Which made the most of code I found adjust the decs token as follow

decs:
    %empty
    | decs dec
;

dec:
    tydecs
    | vardec
    | fundecs
;

tydecs:
    tydec
    | tydec tydecs

tydec: 
    TYPE ID '=' ty
;

vardec:
    VAR ID ASSIGN exp
    | VAR ID ':' ID ASSIGN exp
;

fundecs: 
    fundec
    | fundec fundecs    {$$ = A_FundecList($1, $2);}
;

fundec:
    FUNCTION ID '(' tyfields ')' '=' exp
    | FUNCTION ID '(' tyfields ')' ':' ID '=' exp

The list token fundecs and tydecs were added into the production rule.

I do not understand why doing that, since this will obviously create conflict. Because decs is a list can contain fundecs and tydecs. So a list of fundecs, for example, can be reduced to either a list of decs or a list of fundecs.

Thus I would like to ask why doing this, what is the reason of adding conflict grammar for the parser??

Thanks a lot!!!

Shore
  • 827
  • 9
  • 24
  • 2
    If you write your own code, you will have to think hard about how to do it, which is the best way to learn. You will have to take responsibility for your own mistakes, and figure out how to fix them, but you will only make those mistakes once. If your idea of programming is to search out other people's attempts and copy them, you will just get a superficial learning, because copying doesn't stick in the brain, and you'll end with all of their mistakes. I see from your question that you already understand better than what you found. So close the internet and write your parser. – rici Nov 10 '21 at 18:21
  • 2
    To answer your question: the `A` in AST means "abstract"; in other words, an AST does not slavishly map the grammar. Rather, it attempts to create a useful representation. In this case, the language allows declarations to be mixed, which is convenient, but the compiler will want the different kinds of thing separated. So when making the list of declarations, the semantic action puts each declaration on the correct list. – rici Nov 10 '21 at 18:25
  • @rici Thanks for your answer, could you be so kind explain more on why the compiler would prefer a list of same kind of declarations than a mixed one?? Since in my view I can write declarations with each line different type, which makes the ast producing extra `fundecs` node......... With the first reply, I would like to see the deep reason of this and deciding which one is better..... Thanks again for your answer, would you like to answer this question and I'll accept it.. – Shore Nov 11 '21 at 00:50
  • 1
    because you know what you are looking for when you are looking up a name :-) Since types and variables are in different namespaces, you could use the same name for a type and a variable; putting both of those in a single list would be confusing. Also, all the objects in each list are of the same type, whereas the different lists are of different types, so keeping them in separate lists reduces complications. – rici Nov 11 '21 at 03:05
  • 1
    The other thing, now that I've taken another look at the Tiger spec, is that a contiguous sequence of type declarations is allowed to be recursive, as is a contiguous sequence of function declarations. In both cases, the scope of the declaration starts at the beginning of the contiguous sequence. That's a bit unconventional (and it's not the only unconventional thing in Tiger), and it needs to be managed in some fashion. I think that's what the grammar you quote is trying to do. (Of course, it is ambiguous, but the ambiguity is fairly easy to resolve, for example by using operator precedence.) – rici Nov 11 '21 at 03:08
  • 1
    I'm not certain that that would be my approach to solving that particular problem, but it is a plausible approach. – rici Nov 11 '21 at 03:12
  • @rici sorry for the late reply, I see the mutually recursive definition of function and type declaration and the same name issue.... Which make it a good reason to detect errors for that. Thanks for you wonderful explanation and may I ask you to post your answer, so that I'll accept it lol~~~~~~~~ – Shore Nov 12 '21 at 08:12
  • 1
    @rici I think the most valid reason is the recursive declarations. Which actually means that functions (or types) can reference other function that hasn't been declared. The solution of tiger is to scan them in 2 round: 1. build a empty name type and insert into symbol table first; 2. analyze what it really represent for. – Shore Nov 30 '21 at 12:50

0 Answers0