2

I need to do a function which deletes all comments from the text(code). My code is almost finished, but it doesn't work if comment starts in the first line of the file. It says index out of bounds, I tried changing for loops to start from 1 and then if to(text[i] == '/' && text[i - 1] == '/') but it doesn't work. Any suggestion how can I fix that or improve my code because it looks weird.

public void RemoveComments(string text)
        {
            for (int i = 0; i < text.Length; i++)
            {
                if (text[i] == '/' && text[i + 1] == '/')
                {
                    text = text.Remove(i, 2);
                    for (int j = i; j < text.Length; j++)
                    {
                        if (text[j] != '\n')
                        {
                            text = text.Remove(j, 1);
                            j--;
                        }
                        else if (text[j] == '\n')
                        {
                            text = text.Remove(j, 1);
                            j--;
                            while (text[j] == ' ')
                            {
                                text = text.Remove(j, 1);
                                j--;
                            }
                            i = j;
                            break;
                        }
                    }
                }

                else if (text[i] == '/' && text[i + 1] == '*')
                {
                    text = text.Remove(i, 2);
                    for (int j = i; j < text.Length; j++)
                    {
                        if (text[j] != '*' && text[j + 1] != '/')
                        {
                            text = text.Remove(j, 1);
                            j--;
                        }

                        else if (text[j] == '*' && text[j + 1] == '/')
                        {
                            text = text.Remove(j, 2);
                            j = j - 2;
                            while (text[j] == ' ')
                            {
                                text = text.Remove(j, 1);
                                j--;
                                if (text[j] == '\n')
                                {
                                    text = text.Remove(j, 1);
                                    j--;
                                }
                            }
                            i = j;
                            break;
                        }

                    }
                }
            }
            Console.WriteLine(text);
        }

EDIT: Now I did many experiments and I found that the problem is with(in // loop) I needed this loop this to fix some small aligment problems:

while (text[j] == ' ')
{
    text = text.Remove(j, 1);
    j--;
}

Test.txt file.

//int a;
int c; //int d;
Console.Write/*Line*/("Hhehehe");
if(1>0)
/*ConsoleWriteLine("Yes")*/
//Nooo
D0mm
  • 140
  • 10
  • 2
    Possible duplicate of [What is an "index out of range" exception, and how do I fix it?](https://stackoverflow.com/questions/24812679/what-is-an-index-out-of-range-exception-and-how-do-i-fix-it) – mjwills Nov 16 '18 at 12:58
  • 1
    I think you missed the case of block comment ? – Antoine V Nov 16 '18 at 12:58
  • 1
    Have you debugged the code? Which line of the code throws this exception? – AksharRoop Nov 16 '18 at 12:58
  • It removed all comments from the big code, but then I added a comment to first line of the code and it crashed. – D0mm Nov 16 '18 at 13:01
  • @AntoineV: `else if (text[i] == '/' && text[i + 1] == '*')` – Tim Schmelter Nov 16 '18 at 13:02
  • Why don't you simply split the input text to the newline characters, then if you find the double // you can remove everything from that point on. Of course still is not enough to handle the _/* .....many lines here ... */_ block comment style. – Steve Nov 16 '18 at 13:02
  • Is this only for c# code? If so consider using the Microsoft Compiler Services (Rosyln) to parse and manipulate comments. – Crowcoder Nov 16 '18 at 13:04
  • @D0mm: i guess it would help if you'd add a small sample file with 2-3 lines that demonstrates the issue. – Tim Schmelter Nov 16 '18 at 13:04
  • @TimSchmelter I means block comment likes : /*FilePath = fbd.FileName; enter await Task.Run(() => CallWebService());*/ – Antoine V Nov 16 '18 at 13:05
  • Ok, I will add small example of file text(code) – D0mm Nov 16 '18 at 13:06
  • `i < text.Length -1` to avoid testing the index `[i + 1]` out of range – Cid Nov 16 '18 at 13:06
  • @AntoineV: well, he removes the part between `/*` and `*/` in the code that i have shown. I havent testedt it though, but on a first sight it looks like it is what you are missing. – Tim Schmelter Nov 16 '18 at 13:07
  • 5
    Sorry it can't be that simple, you have to implement a parser: what if you have a code like this: `string demo = "//demo//";` – Dmitry Bychenko Nov 16 '18 at 13:07
  • One of the rare occasions that I'd recommend using RegEx ... – Fildor Nov 16 '18 at 13:11
  • @Fildor: because Regex understands C# code? – Tim Schmelter Nov 16 '18 at 13:12
  • Another Question: Do you also want to remove Documentation? ( `/// ... `) ? – Fildor Nov 16 '18 at 13:12
  • @TimSchmelter No. But if I had to do it, I wouldn't start implementing it with a bunch of for loops. I'd do search and remove with regex patterns. Seems easier to me. Of course, using Roslyn as Crowcoder suggests may work even better, but I am not familiar with that. – Fildor Nov 16 '18 at 13:14
  • @Fildor: i don't even know all the edge cases that are possible with C# code. `demo = "//demo//";` is not a comment but `demo = "//demo"; //";` is. You had to rebuild a compiler with regex, have fun. – Tim Schmelter Nov 16 '18 at 13:22
  • @TimSchmelter Meh, I guess you are right. So you would recommend Roslyn, too? – Fildor Nov 16 '18 at 13:24
  • @Fildor: I can tell you from experience that building a lexical scanner for this is a whole lot easier than trying to do it with regex. But if I had to do it today, I'd use Roslyn. – Jim Mischel Nov 16 '18 at 16:45

2 Answers2

4

Looks like you have C# code files. Thus you can use the power of Roslyn. Simply parse code file into syntax tree and then visit that tree with visitor which skips comments:

var code = File.ReadAllText("Code.cs");
SyntaxTree tree = CSharpSyntaxTree.ParseText(code);
var root = (CompilationUnitSyntax)tree.GetRoot();
var codeWithoutComments = new CommentsRemover().Visit(root).ToString();
Console.WriteLine(codeWithoutComments);

Visitor:

class CommentsRemover : CSharpSyntaxRewriter
{
    public override SyntaxTrivia VisitTrivia(SyntaxTrivia trivia)
    {
        switch(trivia.Kind())
        {
            case SyntaxKind.SingleLineCommentTrivia:
            case SyntaxKind.MultiLineCommentTrivia:
                return default; // new SyntaxTrivia()  // if C# <= 7.0
            default:
                return trivia;                 
        }            
    }
}

Sample code file:

using System;
using System.Collections.Generic;
using System.Text;

namespace ConsoleApp
{
    /* Sample
       Multiline Comment */
    class Program
    {
        static void Main(string[] args)
        {
            // Comment
            Console.Write/*Line*/("Hello, World!"); // Print greeting
            /*ConsoleWriteLine("Yes")*/
        }
    }
}

Output:

using System;
using System.Collections.Generic;
using System.Text;

namespace ConsoleApp
{

    class Program
    {
        static void Main(string[] args)
        {

            Console.Write("Hello, World!");

        }
    }
}

Notes: As you can see, after removing comments from the lines which had nothing except comment, you get empty lines. You can create one more visitor to remove empty lines. Also consider to remove XML comments as well.

whoopdedoo
  • 2,815
  • 23
  • 46
Sergey Berezovskiy
  • 232,247
  • 41
  • 429
  • 459
0

You have a loop based on text.Length

for (int i = 0; i < text.Length; i++)

But inside of the loop you are shorten the text. At a certain point it is smaller as the origin text.Length and you running out of index I guiess

Wilhelm
  • 196
  • 10