3

I need to count the number of classes in correct C# source file. I wrote the following grammar:

grammar CSharpClassGrammar;

options
{
        language=CSharp2;

}

@parser::namespace { CSharpClassGrammar.Generated }
@lexer::namespace  { CSharpClassGrammar.Generated }

@header
{
        using System;
        using System.Collections.Generic;

}

@members
{
        private List<string> _classCollector = new List<string>();
        public List<string> ClassCollector { get { return
_classCollector; } }

}

/*------------------------------------------------------------------
 * PARSER RULES
 *------------------------------------------------------------------*/

csfile  : class_declaration* EOF
        ;

class_declaration
        : (ACCESSLEVEL | MODIFIERS)* PARTIAL? 'class' CLASSNAME
          class_body
          ';'?
          { _classCollector.Add($CLASSNAME.text); }
        ;

class_body
        : '{' class_declaration* '}'
        ;

/*------------------------------------------------------------------
 * LEXER RULES
 *------------------------------------------------------------------*/

ACCESSLEVEL
        : 'public' | 'internal' | 'protected' | 'private' | 'protected
internal'
        ;

MODIFIERS
        : 'static' | 'sealed' | 'abstract'
        ;

PARTIAL
        : 'partial'
        ;

CLASSNAME
        : ('a'..'z'|'A'..'Z'|'_') ('a'..'z'|'A'..'Z'|'0'..'9'|'_')*
        ;

COMMENT
        : '//' ~('\n'|'\r')* {$channel=HIDDEN;}
        |   '/*' ( options {greedy=false;} : . )* '*/' {$channel=HIDDEN;}
        ;

WHITESPACE
        : ( '\t' | ' ' | '\r' | '\n'| '\u000C' )+ { $channel = HIDDEN; }
        ; 

This parser correctly count empty classes (and nested classes too) with empty class-body:

internal class DeclarationClass1
{
    class DeclarationClass2
    {
        public class DeclarationClass3
        {
            abstract class DeclarationClass4
            {
            }
        }
    }
}

I need to count classes with not empty body, such as:

class TestClass
{
    int a = 42;

    class Nested { }
}

I need to somehow ignore all the code that is "not a class declaration". In the example above ignore

int a = 42;

How can I do this? May be example for other language?
Please, help!

Pavel Martynov
  • 560
  • 4
  • 17
  • 1
    Watch out for partial classes too; note that a partial class can be multiple times in one file, once only in one file, or spread over many files. Is Assembly.GetTypes() not an option at all? – Marc Gravell Feb 06 '11 at 15:11
  • Thanks, i remember about partial classes. Assembly.GetTypes() does not suit me, i need to process this at source level. – Pavel Martynov Feb 06 '11 at 15:14

1 Answers1

3

When you're only interested in certain parts of a source file, you could set filter=true in your options { ... } sections. This will enable you to only define those tokens you're interested in, and what you don't define, is ignored by the lexer.

Note that this only works with lexer grammars, not in combined (or parser) grammars.

A little demo:

lexer grammar CSharpClassLexer;

options {
  language=CSharp2;
  filter=true;
}

@namespace { Demo }

Comment
  :  '//' ~('\r' | '\n')*
  |  '/*' .* '*/'
  ;

String
  :  '"' ('\\' . | ~('"' | '\\' | '\r' | '\n'))* '"'
  |  '@' '"' ('"' '"' | ~'"')* '"'
  ;

Class
  :  'class' Space+ Identifier 
     {Console.WriteLine("Found class: " + $Identifier.text);}
  ;

Space
  :  ' ' | '\t' | '\r' | '\n'
  ;

Identifier
  :  ('a'..'z' | 'A'..'Z' | '_') ('a'..'z' | 'A'..'Z' | '_' | '0'..'9')*
  ;

It's important you leave the Identifier in there because you don't want Xclass Foo to be tokenized as: ['X', 'class', 'Foo']. With the Identifier in there, Xclass will become the entire identifier.

The grammar can be tested with the following class:

using System;
using Antlr.Runtime;

namespace Demo
{
    class MainClass
    {
        public static void Main (string[] args)
        {
            string source = 
@"class TestClass
{
    int a = 42;

    string _class = ""inside a string literal: class FooBar {}..."";

    class Nested { 
        /* class NotAClass {} */

        // class X { }

        class DoubleNested {
            string str = @""
                multi line string 
                class Bar {}
            "";
        }
    }
}";
            Console.WriteLine("source=\n" + source + "\n-------------------------");
            ANTLRStringStream Input = new ANTLRStringStream(source);
            CSharpClassLexer Lexer = new CSharpClassLexer(Input);
            CommonTokenStream Tokens = new CommonTokenStream(Lexer);
            Tokens.GetTokens();
        }
    }
}

which produces the following output:

source=
class TestClass
{
    int a = 42;

    string _class = "inside a string literal: class FooBar {}...";

    class Nested { 
        /* class NotAClass {} */

        // class X { }

        class DoubleNested {
            string str = @"
                multi line string 
                class Bar {}
            ";
        }
    }
}
-------------------------
Found class: TestClass
Found class: Nested
Found class: DoubleNested

Note that this is just a quick demo, I am not sure if I handled the proper string literals in the grammar (I am unfamiliar with C#), but this demo should give you a start.

Good luck!

Bart Kiers
  • 166,582
  • 36
  • 299
  • 288
  • @Ben, you probably missed my remark that stated I am unfamiliar with C# and the fact that what I posted was just a quick demo to get the one asking the question started. – Bart Kiers Feb 06 '11 at 20:21
  • Ok, I'll comment on the question itself instead. – Ben Voigt Feb 06 '11 at 20:39
  • @Ben, no problem if you leave it here (as a warning for anyone blindly using my example). I was just making a point that I wasn't going to go and edit my post to account for the "odd" cases you mentioned :). – Bart Kiers Feb 06 '11 at 20:50