-1

I currently go trought all my source files and read their text with File.ReadAllLines and i want to filter all comments with one regex. Basically all comment possiblities. I tried several regex solutions i found on the internet. As this one:

@"(@(?:""[^""]*"")+|""(?:[^""\n\\]+|\\.)*""|'(?:[^'\n\\]+|\\.)*')|//.*|/\*(?s:.*?)\*/"

And the top result when i google:

string blockComments = @"/\*(.*?)\*/";
string lineComments = @"//(.*?)\r?\n";
string strings = @"""((\\[^\n]|[^""\n])*)""";
string verbatimStrings = @"@(""[^""]*"")+";

See: Regex to strip line comments from C#

The second solution won't recognize any comments.

Thats what i currently do

public static List<string> FormatList(List<string> unformattedList, string dataType)
{
    List<string> formattedList = unformattedList;

    string blockComments = @"/\*(.*?)\*/";
    string lineComments = @"//(.*?)\r?\n";
    string strings = @"""((\\[^\n]|[^""\n])*)""";
    string verbatimStrings = @"@(""[^""]*"")+";

    string regexCS = blockComments + "|" + lineComments + "|" + strings + "|" + verbatimStrings;
    //regexCS = @"(@(?:""[^""]*"")+|""(?:[^""\n\\]+|\\.)*""|'(?:[^'\n\\]+|\\.)*')|//.*|/\*(?s:.*?)\*/";
    string regexSQL = "";

    if (dataType.Equals("cs"))
    {
        for(int i = 0; i < formattedList.Count;i++)
        {
            string line = formattedList[i];
            line = line.Trim(' ');

            if(Regex.IsMatch(line, regexCS))
            {
                line = "";
            }

            formattedList[i] = line;
        }
    }
    else if(dataType.Equals("sql"))
    {

    }
    else
    {
        throw new Exception("Unknown DataType");
    }

    return formattedList;
}

The first Method recognizes the comments, but also finds things like

string[] bla = text.Split('\\\\');

Is there any solution to this problem? That the regex excludes the matches which are in a string/char? If you have any other links i should check out please let me know!

I tried a lot and can't figure out why this won't work for me.

[I also tried these links]

https://blog.ostermiller.org/find-comment

https://codereview.stackexchange.com/questions/167582/regular-expression-to-remove-comments

Regex to find comment in c# source file

  • Using Rosalyn would be a better choice than regex. – Kenneth K. Jun 24 '19 at 11:30
  • Regexs are poor for analysing code. In the code of the question: (1) Multi-line comments will not be handled by `blockComments` because it is called from code that is effectively `foreach (individualLine in formattedList) ...`. (2) Double forward-slashes in strings will break `lineComments`, e.g. the line `stringVar = "abc//def";`. (3) There is nothing to stop `verbatimStrings` being processed as `strings`. (4, 5, 6, and more) There are several other problems. Perhaps you need to rethink the whole problem and its possible solutions. – AdrianHHH Jun 24 '19 at 11:54

1 Answers1

0

Doing this with regexes will be very difficult, as stated in the comments. However, a fine way to eliminate comments would be by utilizing a CSharpSyntaxWalker. The syntaxwalker knows about all language constructs and won't make hard to investigate mistakes (as regexes do).

Add a reference to the Microsoft.CodeAnalysis.CSharp Nuget package and inherit from CSharpSyntaxWalker.

class CommentWalker : CSharpSyntaxWalker
{
    public CommentWalker(SyntaxWalkerDepth depth = SyntaxWalkerDepth.Node) : base(depth)
    {
    }

    public override void VisitTrivia(SyntaxTrivia trivia)
    {
        if (trivia.IsKind(SyntaxKind.MultiLineCommentTrivia)
            || trivia.IsKind(SyntaxKind.SingleLineCommentTrivia))
        {
            // Do something with the comments
            // For example, find the comment location in the file, so you can replace it later.
            // Make a List as a public property, so you can iterate the list of comments later on.
        }
    }
}

Then you can use it like so:

// Get the program text from your .cs file
SyntaxTree tree = CSharpSyntaxTree.ParseText(programText);
CompilationUnitSyntax root = tree.GetCompilationUnitRoot();

var walker = new CommentWalker();
walker.Visit(root);

// Now iterate your list of comments (probably backwards) and remove them.

Further reading:

Jesse de Wit
  • 3,867
  • 1
  • 20
  • 41