5

Context

I'm parsing code, where...

  • This code outputs the contents of the first dimension of array a at index i:

    Debug.Print a(i, 1)
    
  • This code outputs the result of function a given parameters i and 1:

    Debug.Print a(i, 1)
    
  • This code calls procedure DoSomething while evaluating foo as a value and passing it by value to the procedure (regardless of whether the signature has it as a "by reference" parameter):

    DoSomething (foo)
    
  • This code calls procedure DoSomething without evaluating foo as a value, and passing it by reference if the signature takes the parameter "by reference":

    Call DoSomething(foo)
    

So I have this lExpression parser rule that's problematic, because the first alternative (#indexExpr) matches both the array and the procedure call:

lExpression :
    lExpression whiteSpace? LPAREN whiteSpace? argumentList? whiteSpace? RPAREN                                     # indexExpr
    | lExpression mandatoryLineContinuation? DOT mandatoryLineContinuation? unrestrictedIdentifier                  # memberAccessExpr
    | lExpression mandatoryLineContinuation? EXCLAMATIONPOINT mandatoryLineContinuation? unrestrictedIdentifier     # dictionaryAccessExpr
    | ME                                                                                                            # instanceExpr
    | identifier                                                                                                    # simpleNameExpr
    | DOT mandatoryLineContinuation? unrestrictedIdentifier                                                         # withMemberAccessExpr
    | EXCLAMATIONPOINT mandatoryLineContinuation? unrestrictedIdentifier                                            # withDictionaryAccessExpr
;

The problem

The specific issue I'm trying to fix here, is best depicted by the stack trace I'm getting out of the parse exception that's thrown with this code:

Sub Test()
    DoSomething (foo), bar
End Sub

failing test stack trace

I can see the callStmt() rule kicking in as it should, but then the expression that's meant to match DoSomething is matching a #lExpr that captures what should be the "argument list", but instead gets picked up as an array index.

Everything I've tried, from moving the #parenthesizedExpr up to a higher priority than #lExpr, to making a memberExpression rule and use that instead of expression in the callStmt rule, has failed (project builds, but I end up with 1500 failing tests because nothing parses anymore).

The reason #lExpr matches DoSomething (foo) is specifically because, well, it's perfectly legal to have an indexExpr there - it's as if I needed some way to ignore a rule in the parsing, but only when I know that there's a callStmt in the lineage.

Is it even possible to disambiguate a(i, 1) (the array call) from a(i, 1) (the function call)?

If so... how?


Additional context

Here's the expression rule from which the lExpression rule is called:

expression :
    // Literal Expression has to come before lExpression, otherwise it'll be classified as simple name expression instead.
    literalExpression                                                                               # literalExpr
    | lExpression                                                                                   # lExpr
    | builtInType                                                                                   # builtInTypeExpr
    | LPAREN whiteSpace? expression whiteSpace? RPAREN                                              # parenthesizedExpr
    | TYPEOF whiteSpace expression                                                                  # typeofexpr        // To make the grammar SLL, the type-of-is-expression is actually the child of an IS relational op.
    | NEW whiteSpace expression                                                                     # newExpr
    | expression whiteSpace? POW whiteSpace? expression                                             # powOp
    | MINUS whiteSpace? expression                                                                  # unaryMinusOp
    | expression whiteSpace? (MULT | DIV) whiteSpace? expression                                    # multOp
    | expression whiteSpace? INTDIV whiteSpace? expression                                          # intDivOp
    | expression whiteSpace? MOD whiteSpace? expression                                             # modOp
    | expression whiteSpace? (PLUS | MINUS) whiteSpace? expression                                  # addOp
    | expression whiteSpace? AMPERSAND whiteSpace? expression                                       # concatOp
    | expression whiteSpace? (EQ | NEQ | LT | GT | LEQ | GEQ | LIKE | IS) whiteSpace? expression    # relationalOp
    | NOT whiteSpace? expression                                                                    # logicalNotOp
    | expression whiteSpace? AND whiteSpace? expression                                             # logicalAndOp
    | expression whiteSpace? OR whiteSpace? expression                                              # logicalOrOp
    | expression whiteSpace? XOR whiteSpace? expression                                             # logicalXorOp
    | expression whiteSpace? EQV whiteSpace? expression                                             # logicalEqvOp
    | expression whiteSpace? IMP whiteSpace? expression                                             # logicalImpOp
    | HASH expression                                                                               # markedFileNumberExpr // Added to support special forms such as Input(file1, #file1)
;

And the callStmt rule, which means to only pick up procedure calls (which may or may not be preceded by a Call keyword):

callStmt :
    CALL whiteSpace expression
    | expression (whiteSpace argumentList)?
;
Mathieu Guindon
  • 69,817
  • 8
  • 107
  • 235
  • I know nothing about vba. But from the first 2 examples given by you, it seems like vba grammar is context sensitive. Whether "Debug.Print a(i, 1)" means a function call or an array referencing depends on the type of identifier 'a'. You need to build a table of identifiers and their types at the current scope and use semantic predicates to guide your parse. – JavaMan Nov 14 '16 at 08:29
  • "Is it even possible to disambiguate a(i, 1) (the array call) from a(i, 1) (the function call)?" - NO, if you rely on context-free grammar as the meaning of "a" depends on its previous declaration – JavaMan Nov 14 '16 at 08:31

2 Answers2

3

(I've built VB6/VBA parsers).

No, you can't distinguish at parse time, precisely because the syntax for a function call and an array access are identical, using a pure context-free parsing engine.

The simple thing to do is to simply parse the construct as array_access_or_function_call, and disambiguate which it is, after parsing by postprocessing the tree, discovering the declaration of the entity (e.g. building a symbol table) whose scope contains the reference (consulting the symbol table), and using that to decide.

This problem isn't unique to VB; C and C++ famously have a similar problem. The solution used in most C/C++ parsers is to have the parser collect declaration information as a side effect as it parses, and then consult that information when it encounters the instance syntax to decide.
This approach changes the parser into a context-sensitive one. The downside is that it tangles (at least partial) symbol table building with parsing, and your parsing engine may or may not cooperate making this more or less awkward to implement.

(I think ANTLR will let you call arbitrary code at various points in the parsing process which can be used to save declaration information, and ANTLR will let you call parse-time predicates to help guide the parser; these should be enough].

I prefer the parse-then-resolve approach because it is cleaner and more maintainable.

Community
  • 1
  • 1
Ira Baxter
  • 93,541
  • 22
  • 172
  • 341
  • Did I write a [vexing parse](https://github.com/rubberduck-vba/Rubberduck/issues/973)? It's not using ambiguous syntax, but it is abusing symbols. The examples in my answer to this question do use ambiguous syntax though... – ThunderFrame Nov 14 '16 at 09:36
  • So, "the former" would be the part that's in parentheses, right? "calling arbitrary code at various points in the parsing process"? The ANTLR guy is telling me that this would indeed be the way to go. So here, have a checkmark - I have homework and reading to do! =) – Mathieu Guindon Nov 14 '16 at 19:02
  • @Mat'sMug Modified to make it clear what I meant by "the former": parse-then-resolve. – Ira Baxter Nov 14 '16 at 22:07
3

You can't tell an array from a procedure call. Even at resolution time, you still can't necessarily know, as the sub-type of the variable might change as late as run-time.

This example shows the impact of default members that accept optional arguments

  Dim var As Variant

  Set var = Range("A1:B2")
  Debug.Print var(1, 1)     'Access the _Default/Item property with indice arguments

  var = var                 'Accesses the _Default/Item property without arguments
  Debug.Print var(1, 1)     'Array indices

You can't even reliably tell if the result of a procedure is a procedure call or an array index:

  Dim var1 As Variant
  Set var1 = New Dictionary
  Dim var2 As Variant
  Set var2 = New Dictionary
  var2.Add 0, "Foo"
  var1.Add 0, var2
  Debug.Print var1(0)(0)    'Accesses the default/Item of the default/Item

  var1 = Array(Array(1))
  Debug.Print var1(0)(0)    'Accesses the first index of the first index

You'll need to treat parenthesized blocks that follow a variable name as possibly belonging to a procedure or an array. In fact, it might even be useful to think of accessing an array member as if it has a default Item member. That way, an array is no different to an object with a default member that requires arguments that happen to be indices (and happens to have dedicated constructor syntaxes).

ThunderFrame
  • 9,352
  • 2
  • 29
  • 60