2

Say that I am creating a simple JavaScript language parser that is only concerning with parsing functions.

I need to differentiate between function "declarations"/"statements" and function expressions. Because they look practically identical, I figure I need to know the context in which function is used.

I suppose that I can determine a function expression by the preceding token. I am thinking that the following algorithm might work:

  • If token is "function", then
    • If previous token is an operator,
      except for a "closing" operator, like "]", "}", or ")", OR
      if previous token is ":", then
      • Function is an function expression.
    • Else
      • Function is a function declaration.

Can I expect this algorithm to correctly determine if a function is a declaration or an expression? If it has flaws, what should be fixed? Or, if it is not possible to distinguish the forms simply by looking at the previous token, how else could I distinguish the forms with the least amount of effort?

(I am aware that Esprima and co. exist. I want to implement a native parser in a different language.)

Jackson
  • 9,188
  • 6
  • 52
  • 77
  • In which language is the parser being implemented? – Anderson Green Nov 30 '14 at 20:28
  • @AndersonGreen Emacs Lisp. – Jackson Nov 30 '14 at 20:28
  • Several parser generators have been written in Emacs Lisp. See the discussion here: http://stackoverflow.com/questions/2228477/parsing-in-emacs-lisp – Anderson Green Nov 30 '14 at 20:41
  • Whether or not I used a parser generator, I would still need to determine the accuracy of the algorithm in my question. – Jackson Nov 30 '14 at 21:06
  • 1
    What kind of parser are you writing? Examining only the previous token will probably give false results on some edge case. Can't you just look at the surrounding context, whether it expects an expression or a statement, to follow the ES5 grammar more closely? – Bergi Nov 30 '14 at 21:30
  • The parser is one which finds functions in JavaScript source code. If you have an idea how I may "look at the surrounding context" simply, it may make for a good answer. – Jackson Nov 30 '14 at 22:09
  • @Jackson: so you're not trying to parse the whole program code into an AST, only find functions in a source text? – Bergi Nov 30 '14 at 22:12
  • I don't think it matters in the context of this question, but I am determining the scopes of variables inside functions. So I will be looking for functions, the scopes of their names (so it matters if they are declarations or expressions), the scopes of their parameters, and the scopes of their local variables. For this I will also need to parse `var` statements. Trivially, I also plan on parsing `catch` statements because they create scopes too. I'm not thinking about `let` right now. – Jackson Nov 30 '14 at 22:18
  • So no, I'm not trying to create an AST, unless of course it's required. Perhaps I will need a simple one because I will need to make 2 passes due to variable hoisting. – Jackson Nov 30 '14 at 22:20
  • @Jackson—in terms of scope, there is no difference between a function expression and a function declaration, they both create a new execution context and new scope. If you ignore *eval*, there is only global and function scope. – RobG Nov 30 '14 at 23:24
  • @RobG: But there's much difference for the scope of the identifier that was used for the function's name. – Bergi Nov 30 '14 at 23:32

1 Answers1

2

I'm also writing a JavaScript parser - for Java, with JavaCC. Is it "en vogue"? :)

I'm no way an expert so my terminology may be somewhat household level, please excuse that.

If I understood your idea right, seems like you want to distinguish function declarations and expressions on the lexical level. I think this is a wrong way. JavaScript has a very tricky grammar, this may work with function declarations but you'll be hitting corner cases all over the way. Two of the most complicated are automatic semicolon insertion and regular expressions vs. division.

Now to your question.

Grammar:

FunctionDeclaration :
    function Identifier ( FormalParameterList_opt ) { FunctionBody }

FunctionExpression :
    function Identifier_opt ( FormalParameterList_opt ) { FunctionBody }

One case function ( is easy. No Identifier - can't be a FunctionDeclaration. However this does not guarantee that this can be FunctionExpression: function () {} on the top level is grammatically incorrect.

FunctionExpression may appear where expressions may appear except for ExpressionStatement.

So the question is, can you reliably find out if you can expect an expression in some place lexically (i.e. just looking on previous token).

I think this might be rather hard. Take a look at my analysis for the similar problem (detecting a regex lexically).

For you algorithm:

  • If token is "function", then
    • If previous token is an operator, except for a "closing" operator, like "]", "}", or ")", OR
      if previous token is ":", then
      • Function is an function expression.
    • Else
      • Function is a function declaration.

What if the previous token was /? And function comes next? You'll think this is a function expression but this may be a regular expression literal.

Also : does not mean this is a function expression, this may be invalid:

label: function() {}

I also think further complications may be with ASI. Consider:

i++
function a() {}

++ is a postfix operator preceding function, but function a() {} is a function declaration, there was a semicolon inserted automatically before it.

So I think your algorithm is not correct. And I'm not sure you can get away with simply looking at a couple of previous tokens.

Community
  • 1
  • 1
lexicore
  • 42,748
  • 17
  • 132
  • 221
  • @PiotrDabkowski—in the code you posted, `;` is an empty statement, `function f() {}` is a function declaration (keyword *function* at the start of a statement), `+ 1` is an expression that resolves to the value positive one. – RobG Nov 30 '14 at 23:21
  • Hmm that's strange, thanks. But * will not resolve :) I think its a bug in the implementation since it should parse as far as possible without ; insertion. And +1 is not treated as a number literal (literals are without a sign). – Piotr Dabkowski Nov 30 '14 at 23:22
  • 1
    @PiotrDabkowski `;function f() {} * 1` is invalid. `;` is an empty statement, `function f() {}` is a function declaration, `* 1` can't appear. – lexicore Nov 30 '14 at 23:27
  • Thats very strange because 1*function f() {} works. And I cant find anything in the specification that would explain an error in function f() {} * 1 . function f() {} should be treated as a named function expression in both cases since its a part of an expression. It seems that JS implementations dont care about following tokens. – Piotr Dabkowski Nov 30 '14 at 23:30
  • @PiotrDabkowski `function f() {}` is a function declaration this a source element. When program is parsed, it matches a top-level source element. So the following `* 1` must also be a source element which it is not. – lexicore Nov 30 '14 at 23:33
  • Okay I made further checks and it seems that implementations don't care about following tokens, it makes things simpler but I think that its not correct according to the specification. – Piotr Dabkowski Nov 30 '14 at 23:40