0

I have the following grammar:

grammar Test;

options {
  lang = Python;
}

declaration returns [value]
    :     'enum' ID { statement* }
                 { $value = {'id': $ID.text,
                             'fields': $statement.value}
                 }
    ;

statement returns [value]
    :     ID ':' INT ';' { $value = {'id': $ID.text, 'value': int($INT.text)} }
    ;

To parse syntax of type:

enum Test {
  Foo: 3;
  Bar: 5;
}

However, I am struggling with getting the statement* rule into a list of statements. I want my final parsed object to look like:

declaration = {
  'id': 'Test',
  'fields': [
    {'id': 'Foo', 'value': 3},
    {'id': 'Bar', 'value': 5},
}

I can parse each of the statement results correctly, so that each $statement.value is correct. But, given the asterisk on statement* in rule declaration, is there a way I can condense it into a list of fields easily ? I was hoping to have some sort of syntax that gets me this option for free.

Right now this just takes the last statement, so it returns:

declaration = {
  'id': 'Test',
  'fields': [
    {'id': 'Bar', 'value': 5},
}

I want a generic solution because my grammar has a lot of rules of the form:

some_declaration
      :     keyword ID '{' declaration_statement* '}'
      ;

Note: I am coding this in Python. I have tried coding this as a parser followed by a tree grammar, but even then the last element is the only one I get, the rest are discarded.

Arindam
  • 342
  • 1
  • 12

1 Answers1

1

You can do it like this:

declaration returns [value]
    :     'enum' ID 
                 { $value = {'id': $ID.text,
                             'fields': []}
                 }
           '{' (r=statement
                 { $value['fields'].append($r.value) }
             )*
           '}'
    ;

Or you can also pass your fields list to the statement rule as a parameter, and append new values there. Something like this:

declaration returns [value]
    :     'enum' ID 
                 { $value = {'id': $ID.text,
                             'fields': []}
                 }
                 { list = $value['fields']
                 }
            '{' statement[list]* '}'
    ;

statement[list]
    :     ID ':' INT ';' { $list.append({'id': $ID.text, 'value': int($INT.text)}) }
    ;

Depends on what you want, but probably the first option is a bit nicer.

Here you can find some more examples on returning values from one rule to another: Two basic ANTLR questions

Community
  • 1
  • 1
lp_
  • 1,158
  • 1
  • 14
  • 21
  • Thanks. I ended up using the first construct that you mentioned. Also, what's the use of the += operator then ? – Arindam Dec 10 '14 at 22:22
  • do you mean usng `+=` instead of `append` (e.g. `$value['fields']+=$r.value` vs `$value['fields'].append($r.value)`)? it is slightly different, check out [this post](http://stackoverflow.com/questions/2022031/python-append-vs-operator-on-lists-why-do-these-give-different-results) – lp_ Dec 11 '14 at 08:42
  • Ah, no I meant the antlr3 += operator. I saw a few uses of it in the form: `c=ID ('.' c+=ID)+` – Arindam Dec 12 '14 at 02:07
  • 1
    Ahha! `+=` is useful to have a list of matching tokens or to collect ASTs, but I don't think that you can simply use it with a grammar rule. so you cannot just say `l+=statement` (which probably you'd need here). Perhaps, you could say `(ids+=ID ':' ints+=INT)*` in the declaration, instead of `statement*`, and process the `ids` and `ints` lists. but it's not a nice solution ;) Check out section 4.3 of the _The Definitive ANTLR Reference_, or just try to replace `ID` in `statement` with `ids+=ID` and see the generated Python code what happens there. – lp_ Dec 12 '14 at 10:01