2

According to the ECMAScript specification in section 7.8.1 a NullLiteral is defined as follows:

NullLiteral :: 
    null

What I am trying to understand is how this is represented in tree form when a NullLiteral is included in the following productions found in sections 7.6.1 and 7.8.

ReservedWord :: 
    Keyword 
    FutureReservedWord 
    NullLiteral 
    BooleanLiteral 
Literal :: 
    NullLiteral 
    BooleanLiteral 
    NumericLiteral 
    StringLiteral 

My best guess as to how it would look is this:

InputElementDiv
    |
  Token
    |
IdentifierName
    |
ReservedWord 
    |
 Literal 
    |
NullLiteral 
    |
   null

This just does not seem right to me though.

Note

From my research it seems that very few compilers actually generate CSTs from the language grammar. I can of course understand why but this is a learning exercise for me so I want to get this right before I move to more professional means of parsing such as using a parser generator.

Shog9
  • 156,901
  • 35
  • 231
  • 235
ChaosPandion
  • 77,506
  • 18
  • 119
  • 157
  • How about providing a link to the relevant section(s) of the spec? – LukeH Jul 14 '10 at 15:36
  • Can you explain a bit more about why the syntax tree you came up with doesn't seem right to you? I'm not seeing a problem there. – Jim Lewis Jul 15 '10 at 17:25
  • @Jim - I think my main confusion is in where to place `Literal` or `ReservedWord` in the hierarchical structure below `IdentifierName`. Who knows, maybe there isn't a problem. I am learning this all on my own and there are no ECMAScript 5 CST parsers that I am aware of and that is not for a lack of searching. – ChaosPandion Jul 15 '10 at 17:36
  • @Jim - Just FYI, my current parser does work fine with the structure I used but I really want to understand this before I finish my scripting engine. – ChaosPandion Jul 15 '10 at 17:39

1 Answers1

2

The tree as shown is not covered by the grammar, because that does not provide a derivation from IdentifierName to ReservedWord, and it does not provide for deriving ReservedWord to Literal either.

The ReservedWord production in fact is used only to restrict valid values of IdentifierName, and this should be seen on the lexical level. It does not make it into a CST, where you would see just the IdentifierName.

The context of Literal is PrimaryExpression, so a fragment of a real CST could look like this:

   ...
    |
PrimaryExpression
    |
 Literal 
    |
NullLiteral 
    |
   null
Gunther
  • 5,146
  • 1
  • 24
  • 35
  • So does this mean InputElementDiv does not belong in the CST? – ChaosPandion Jul 16 '10 at 17:06
  • No, it doesn't. The spec says that it is the goal symbol of the lexical grammar, which along with InputElementRegExp is meant to distinguish two lexers. Actually, when using the InputElementDiv production, you'd derive "null" as an IdentifierName. From syntax and CST point of view, rather look at the symbols that can be derived from Program. – Gunther Jul 16 '10 at 21:48
  • Never in my life have I had to work so hard to understand something. It is quite amazing. I am just waiting for it to finally click. Anyway, let me think about what you have said just in case I need a bit of clarification. – ChaosPandion Jul 16 '10 at 22:11
  • I appreciate your answer. May I ask what your background is? – ChaosPandion Jul 19 '10 at 16:30
  • Thanks for your generous award, and for bringing the ECMAScript spec to my attention. I am now using its grammar as a testcase for my parser generator. In fact I think that their idea of "lexical grammar goal symbols" is not overly helpful. – Gunther Jul 19 '10 at 21:43