0

So, I'm try to parse something like this.

permit 16 any eq 30 www any eq 80 established log-input

The parse tree I'm aiming for looks like this. actual output from test rig

As you can see, the 16 is my problem. I've nested rules, and it doesn't like it.

The relevant section...

ace  : remarks? action source destination ops;
action: ( P | D ) PROTO ;
P : 'permit' ;
D : 'deny' ;
NUMBER : [0-9]+ ;
PROTO : 'ip'
      | 'tcp'
      | 'udp'
      | 'eigrp'
      | 'icmp'
      | NUMBER
      ;
ID : [a-zA-Z-]+ ;

If Number is 1st, I get the RED 16, if PROTO is 1st, then all the ports downstream turn RED.

I get that it's just running my LEX rules in order, and they are ambiguous. PROTO can match any number, and so can NUMBER.

However I tried to solve that with nesting them, and fragments, to no avail.

ace  : remarks? action source destination ops;
action: ( P | D ) ;
P : 'permit' PROTO;
D : 'deny' PROTO;
NUMBER : [0-9]+ ;
fragment PROTO : 'ip'
      | 'tcp'
      | 'udp'
      | 'eigrp'
      | 'icmp'
      | NUMBER
      ;
ID : [a-zA-Z-]+ ;

As soon as I do that, my 'catch-all' ID starts gobbling everything up, and It's still in a tree, but all my token types turn to IDs.

I've looked around this forum, and the gargler for hours, and I haven't seen any way to sort this out, however oddly, the behavior I want is working elsewhere in this same grammar.

destination : address ports? ;
address : ADDRESS ADDRESS | HST ADDRESS | ANY ;
ADDRESS : QUAD DOT QUAD DOT QUAD DOT QUAD ;
fragment QUAD : TWO LO5 LO5 | TWO LO4 DIG | ONE DIG DIG | DIG DIG | DIG ;
fragment DOT : '.' ;
fragment ONE : [1] ;
fragment TWO : [2] ;
fragment LO4 : [0-4] ;
fragment LO5 : [0-5] ;
fragment DIG : [0-9] ;

That works like a champ, only grabs IPs and host addresses without failure. Of course the 'ports?' section still goes to garbage. However using that same setup, can't seem to grab PORT/PROTOCOL.

I'm missing something fundamental, and after rearranging this thing for far too long... I'm wondering if I should not try and get such specific TOKEN IDs, and handle it in post (aka, with a listener later) or if my tree should contain proper token tags.

** Technically protocols addressed by number should be < 256 so they are a QUAD as I've defined them, but I can't get that working...

Ideas? Suggests? I have the tree, so who cares if it's a number in that spot? I know the parent is action, so the right hand tree being a number should be validated as less than 256 later? I assume the ambiguity is killing it, and if I could redesign this thing removing all ambiguity somehow?

(BTW I'm a self taught novice, so try and speak to me like I have no college education in computer science, I've never read the dragon book, and I've been programming with ANTLR for 4 days... because that is who you're talking to.)

Allen
  • 11
  • 3
  • Yeah, that's the #1 ANTLR misunderstanding - the lexer is totally separated from the parser, read [my post here](https://stackoverflow.com/a/46267981/3764814). – Lucas Trzesniewski Oct 01 '17 at 22:32
  • I get that the parsing logic is isolated from the lexing logic. I guess I'm not sure why P : 'permit' (PROTO|NUMBER ; doesn't match. I've created a 'longer' rule permit + 16 > permit, or Number individually. I would think P, would match. Yet it doesn't. maybe because i'm ditching the space.. Not sure. – Allen Oct 01 '17 at 22:48
  • The issue here is that `PROTO` includes the `NUMBER` alternative but it can never match a `NUMBER`, because `NUMBER` is defined first. Your `action` rule expects `P` `PROTO` but receives `P` `NUMBER` instead. – Lucas Trzesniewski Oct 02 '17 at 08:10
  • I see that in what I posted now, Thank you. Let me ask a different question, a basic simple 3 line lexer... PERMIT : PER PORT PORT : [0-9]+ PER : 'permit' PERMIT will never match... PORT matches, PER matches, but PERMIT will not. I have a LEXER rule PERMIT with a 'longer' match, but does the LONGER match really mean it has to be contiguous characters? – Allen Oct 02 '17 at 19:15
  • Yes, tokens are formed of contiguous characters. Your rule would match `permit42` for instance. A parser rule would be a better fit there I think. – Lucas Trzesniewski Oct 02 '17 at 21:39

1 Answers1

0

I know this is an old thread. But hope this helps someone. You can actually use a catch all token to parse most values, and then convert it to more specific values (enums, constants etc) in your visitor/listener code. This is a stripped down example of the grammar that had worked for me

Base Grammar:


    lexer grammar Base ;
    
    fragment LOWERCASE  : [a-z];
    fragment UPPERCASE  : [A-Z];
    fragment NUMBER     : [0-9]+;
    fragment WORD       : (LOWERCASE | UPPERCASE | NUMBER | '-' | '_' | '/')+;
    fragment NEWLINE    : '\r' '\n'
        | '\n'
        | '\r';
    
    fragment OBJECT_DESCRIPTION     : ' description';CRLF
        : NEWLINE ;
    
    VALUE
        : (WORD | '.')+ ;
    
    WHITESPACE
        : ' '   -> skip ;
    
    IGNORE
        : .     -> skip ;
    

Access rule grammar:


    grammar AccessList;
    
    import Base;
    
    //access-list acl-1 extended permit udp | object network-object-1 eq 123 (source)| 1.1.1.1 ne www (destination)|
    
    accessLists             : (accessListDestination)+;
    accessListDestination   : accessListSource accessListTarget (' rule-id' aclId = VALUE)?;
    accessListSource        : accessListProtocol accessListTarget;
    accessListTarget: (
            ({_input.LT(1).getText().matches("object|object-group")}? objectType = VALUE objectName = VALUE)
            |({_input.LT(1).getText().matches("host")}? host=VALUE ip=VALUE) | ip = VALUE 
        ) accessListPorts?;
    accessListPorts         : (accessListPort | accessListPortRange);
    accessListPortRange     : VALUE startPort=VALUE endPort=VALUE;
    accessListPort          : operatorOrObjectType=VALUE portOrPortGroup=VALUE;
    
    accessListProtocol:
        ACCESS_LIST_KEY name = VALUE accessListType = ACCESS_LIST_TYPE action = VALUE protocol = VALUE accessListInterface?;
    
    accessListInterface: 'ifc' accessListInterfaceName=VALUE;
    
    fragment ACCESS_LIST    : 'access-list';
    fragment EXTENDED       : 'extended';
    fragment ADVANCED       : 'advanced';

arun tom
  • 151
  • 1
  • 9