Why 42.toString() fails in JS?

Question

Disclaimer

Guys, I DO aware of Why does 10..toString() work, but 10.toString() does not? question existence, but the thing is that it doesn't provide the formal explanation.

The specification's interpretation of the . character in that particular position is that it will be a decimal. This is defined by the numeric literal syntax of ECMAScript.

Without reference to a standard isn't trustable enough

The question body

I subconsciously understand that

42..toString()

is treated by a parser as a 42. number followed by a .toString() call.

What I cannot understand is why an interpreter cannot realize that

42.toString()

is a 42 followed by a method call.

Is it just a drawback of modern JS interpreters or is it explicitly stated by ES5.1?

From ES5.1 the Numeric Literal is defined as (only significant part of definition):

NumericLiteral ::
    DecimalLiteral
    HexIntegerLiteral

DecimalLiteral ::
    DecimalIntegerLiteral . DecimalDigits(opt) ExponentPart(opt)
    . DecimalDigits ExponentPart(opt)
    DecimalIntegerLiteral ExponentPart(opt)

The last rule is what I expect to be chosen by a parser.

UPD: to clarify, this question expects as an answer references to ES specification that state explicitly that interpreter must behave like it does

Is this the example I used in my previous comment – Derek 朕會功夫 Jun 25 '14 at 00:21 — Derek 朕會功夫, Jun 25 '14 at 00:21
@Derek 朕會功夫: that's exactly it :-) – zerkms Jun 25 '14 at 00:21 — zerkms, Jun 25 '14 at 00:21
The `.` in `DecimalLiteral` doesn't seem to be optional. – Derek 朕會功夫 Jun 25 '14 at 00:37 — Derek 朕會功夫, Jun 25 '14 at 00:37
@Derek 朕會功夫: see the line #3 – zerkms Jun 25 '14 at 00:43 — zerkms, Jun 25 '14 at 00:43

Jesse Rusak · Accepted Answer · 2015-05-18T13:18:49.030

8

I believe the piece you're missing is this quote from section 7:

The source text is scanned from left to right, repeatedly taking the longest possible sequence of characters as the next input element.

Note "longest possible sequence of characters"; since "42." is a valid token (which is a kind of input element), it must be used rather than "42" and then ".".

edited May 18 '15 at 13:18

answered Jun 25 '14 at 00:42

Jesse Rusak

56,530
12
101
102

1

Valid syntax: `42..toString()`, `(42).toString()`, and `42 .toString()`. – leslie.zhang Jun 13 '15 at 06:15

score 0 · Answer 2 · answered Jun 25 '14 at 00:38

0

The lexical phase of the parsing process consumes the . precisely due to the definition of NumericLiteral quoted from the standard. The definition specifies that a numeric literal (here a DecimalLiteral) includes a ., if present - see the first line under DecimalLiteral ::.

Since 42. is consumed by the numeric literal token, the next token is toString, which lacks a leading period and therefore cannot be recognized as method call.

answered Jun 25 '14 at 00:38

user4815162342

141,790
18
296
355

"precisely due to the definition of NumericLiteral quoted from the standard" --- see the `DecimalIntegerLiteral ExponentPart(opt)` rule which doesn't have a required `.` – zerkms Jun 25 '14 at 00:44
@zerkms It is not required, but it's there in your example, and the lexer will consume it for the token. – user4815162342 Jun 25 '14 at 00:58
that is the whole point of my question - why it does that. And another answer explains that. Whereas this one is based on... on what actually? – zerkms Jun 25 '14 at 00:58
@zerkms Even without the sentence quoted in Jesse Rusak's answer, EcmaScript would still behave the same. Context-free grammars do not go back to lexical analysis to try a different tokenization of input in case of syntax error. – user4815162342 Jun 25 '14 at 08:01
"Context-free grammars do not go back to lexical analysis" --- what this statement is based on? Is it defined in ES? If yes - where? (this is the whole point of my question) If not, how am I supposed to know ES standard is based on a given kind of grammar? – zerkms Jun 25 '14 at 08:13
@zerkms 5.1.1 and 5.1.2 make it clear that ES is specified with a context-free grammar. Also, section 7 explicitly states, *The source text of an ECMAScript program is first converted into a sequence of input elements, which are tokens, line terminators, comments, or white space.* I.e. tokenization is done first does not depend on grammatical interpretation of the input. – user4815162342 Jun 25 '14 at 08:29
That should have been put into an answer, because that what was the question about (it's the 3rd time I'm mentioning about this in this conversation) – zerkms Jun 25 '14 at 08:52
@zerkms You've made that point quite clear, yes. However, the question already quotes the standard, and the answer clearly refers to that quotation. – user4815162342 Jun 25 '14 at 09:10
I promise it's the last comment here :-D From both the question and this answer it's not obvious why parser must prefer the last `DecimalLiteral` rule over the first. – zerkms Jun 25 '14 at 09:13
@zerkms Actually, it must prefers the first. :) It may not be *obvious*, but it does follow from the widely understood meaning of the notation used in your quote. Consider this: if the lexer had the freedom to interpret the input `42.toString` as `42` followed by `.` followed by `toString`, the input `42.3` would become a syntax error. If the standard allowed the lexer to look ahead or be revisited after the syntactic analysis was shown to fail, then it wouldn't use a BNF-like notation nor mention context-free grammars in the first place. – user4815162342 Jun 25 '14 at 09:36

Why 42.toString() fails in JS?

Disclaimer

The question body

2 Answers2

Linked