4

I'm trying to understand how JS is actually parsed. But my searches either return some ones very vaguely documented project of a "parser/generator" (i don't even know what that means), or how to parse JS using a JS Engine using the magical "parse" method. I don't want to scan through a bunch of code and try all my life to understand (although i can, it would take too long).

i want to know how an arbitrary string of JS code is actually turned into objects, functions, variables etc. I also want to know the procedures, and techniques that turns that string into stuff, gets stored, referenced, executed.

Are there any documentation/references for this?

Joseph
  • 117,725
  • 30
  • 181
  • 234
  • 2
    JavaScript is parsed just as any other language is parsed; http://en.wikipedia.org/wiki/Parsing. The file is matched against a defined grammer and split up into tokens. It is then interpreted so it can be executed. http://en.wikipedia.org/wiki/Interpreter_(computing) – Matt Apr 05 '12 at 08:38

3 Answers3

4

Parsers probably work in all sorts of ways, but fundamentally they first go through a stage of tokenisation, then give the result to the compiler, which turns it into a program if it can. For example, given:

function foo(a) {
  alert(a);
}

the parser will remove any leading whitespace to the first character, the letter "f". It will collect characters until it gets something that doesn't belong, the whitespace, that indicates the end of the token. It starts again with the "f" of "foo" until it gets to the "(", so it now has the tokens "function" and "foo". It knows "(" is a token on its own, so that's 3 tokens. It then gets the "a" followed by ")" which are two more tokens to make 5, and so on.

The only need for whitespace is between tokens that are otherwise ambiguous (e.g. there must be either whitespace or another token between "function" and "foo").

Once tokenisation is complete, it goes to the compiler, which sees "function" as an identifier, and interprets it as the keyword "function". It then gets "foo", an identifier that the language grammar tells it is the function name. Then the "(" indicates an opening grouping operator and hence the start of a formal parameter list, and so on.

Compilers may deal with tokens one at a time, or may grab them in chunks, or do all sorts of weird things to make them run faster.

You can also read How do C/C++ parsers work?, which gives a few more clues. Or just use Google.

Community
  • 1
  • 1
RobG
  • 142,382
  • 31
  • 172
  • 209
1

While it doesn't correspond closely to the way real JS engines work, you might be interested in reading Douglas Crockford's article on Top Down Operator Precedence, which includes code for a small working lexer and parser written in the Javascript subset it parses. It's very readable and concise code (with good accompanying explanations) which at least gives you an outline of how a real implementation might work.

A more common technique than Crockford's "Top Down Operator Precedence" is recursive descent parsing, which is used in Narcissus, a complete implementation of JS in JS.

0

maybe esprima will help you to understand how JS parses the grammar. it's online

  • 1
    Your answer could be improved with additional supporting information. Please [edit] to add further details, such as citations or documentation, so that others can confirm that your answer is correct. You can find more information on how to write good answers [in the help center](/help/how-to-answer). – Community Jul 13 '22 at 12:28