6

Disclaimer

Guys, I DO aware of Why does 10..toString() work, but 10.toString() does not? question existence, but the thing is that it doesn't provide the formal explanation.

The specification's interpretation of the . character in that particular position is that it will be a decimal. This is defined by the numeric literal syntax of ECMAScript.

Without reference to a standard isn't trustable enough

The question body

I subconsciously understand that

42..toString()

is treated by a parser as a 42. number followed by a .toString() call.

What I cannot understand is why an interpreter cannot realize that

42.toString()

is a 42 followed by a method call.

Is it just a drawback of modern JS interpreters or is it explicitly stated by ES5.1?

From ES5.1 the Numeric Literal is defined as (only significant part of definition):

NumericLiteral ::
    DecimalLiteral
    HexIntegerLiteral

DecimalLiteral ::
    DecimalIntegerLiteral . DecimalDigits(opt) ExponentPart(opt)
    . DecimalDigits ExponentPart(opt)
    DecimalIntegerLiteral ExponentPart(opt)

The last rule is what I expect to be chosen by a parser.

UPD: to clarify, this question expects as an answer references to ES specification that state explicitly that interpreter must behave like it does

Community
  • 1
  • 1
zerkms
  • 249,484
  • 69
  • 436
  • 539

2 Answers2

8

I believe the piece you're missing is this quote from section 7:

The source text is scanned from left to right, repeatedly taking the longest possible sequence of characters as the next input element.

Note "longest possible sequence of characters"; since "42." is a valid token (which is a kind of input element), it must be used rather than "42" and then ".".

Jesse Rusak
  • 56,530
  • 12
  • 101
  • 102
0

The lexical phase of the parsing process consumes the . precisely due to the definition of NumericLiteral quoted from the standard. The definition specifies that a numeric literal (here a DecimalLiteral) includes a ., if present - see the first line under DecimalLiteral ::.

Since 42. is consumed by the numeric literal token, the next token is toString, which lacks a leading period and therefore cannot be recognized as method call.

user4815162342
  • 141,790
  • 18
  • 296
  • 355
  • "precisely due to the definition of NumericLiteral quoted from the standard" --- see the `DecimalIntegerLiteral ExponentPart(opt)` rule which doesn't have a required `.` – zerkms Jun 25 '14 at 00:44
  • @zerkms It is not required, but it's there in your example, and the lexer will consume it for the token. – user4815162342 Jun 25 '14 at 00:58
  • that is the whole point of my question - why it does that. And another answer explains that. Whereas this one is based on... on what actually? – zerkms Jun 25 '14 at 00:58
  • @zerkms Even without the sentence quoted in Jesse Rusak's answer, EcmaScript would still behave the same. Context-free grammars do not go back to lexical analysis to try a different tokenization of input in case of syntax error. – user4815162342 Jun 25 '14 at 08:01
  • "Context-free grammars do not go back to lexical analysis" --- what this statement is based on? Is it defined in ES? If yes - where? (this is the whole point of my question) If not, how am I supposed to know ES standard is based on a given kind of grammar? – zerkms Jun 25 '14 at 08:13
  • @zerkms 5.1.1 and 5.1.2 make it clear that ES is specified with a context-free grammar. Also, section 7 explicitly states, *The source text of an ECMAScript program is first converted into a sequence of input elements, which are tokens, line terminators, comments, or white space.* I.e. tokenization is done first does not depend on grammatical interpretation of the input. – user4815162342 Jun 25 '14 at 08:29
  • That should have been put into an answer, because that what was the question about (it's the 3rd time I'm mentioning about this in this conversation) – zerkms Jun 25 '14 at 08:52
  • @zerkms You've made that point quite clear, yes. However, the question already quotes the standard, and the answer clearly refers to that quotation. – user4815162342 Jun 25 '14 at 09:10
  • I promise it's the last comment here :-D From both the question and this answer it's not obvious why parser must prefer the last `DecimalLiteral` rule over the first. – zerkms Jun 25 '14 at 09:13
  • @zerkms Actually, it must prefers the first. :) It may not be *obvious*, but it does follow from the widely understood meaning of the notation used in your quote. Consider this: if the lexer had the freedom to interpret the input `42.toString` as `42` followed by `.` followed by `toString`, the input `42.3` would become a syntax error. If the standard allowed the lexer to look ahead or be revisited after the syntactic analysis was shown to fail, then it wouldn't use a BNF-like notation nor mention context-free grammars in the first place. – user4815162342 Jun 25 '14 at 09:36