Remove comments from file

Question

I have a text file like this

/* 
This is a comment 
I a looking to delete it
*/
//CALCULATE;     
Language([Dim Currency].[Currency].&[4]) = 2057;     
Language([Dim Currency].[Currency].&[2]) = 2067;

I've tried this code

var newLines = oldLines.Select(line => new { 
                Line = line, 
                Words = line.Split("/*") 
            })
            .Where(lineInfo => !lineInfo.Words.Contains(wordToDelete))
            .Select(lineInfo => lineInfo.Line);
var newLines1 = oldLines.Select(line => new { 
            Line = line, 
            Words = line.Split("*/") 
        })
        .Where(lineInfo => !lineInfo.Words.Contains(wordToDelete))
        .Select(lineInfo => lineInfo.Line);

The codes returns this

This is a comment 
I a looking to delete it
//CALCULATE;     
Language([Dim Currency].[Currency].&[4]) = 2057;     
Language([Dim Currency].[Currency].&[2]) = 2067;

How can I modify my LINQ to make the result look like this (without block comments):

   //CALCULATE;     
    Language([Dim Currency].[Currency].&[4]) = 2057;     
    Language([Dim Currency].[Currency].&[2]) = 2067;

@GuilhermeOliveira: Please undelete your answer. I said that it wouldn't support comments which didn't start at the beginning of the line, but on closer inspection I was reading your code wrong. It should be ok. — Ben Voigt, Apr 07 '14 at 14:57
I think you need to Follow two pass strategy . First pass for removing all Multi-line Comments ( /* """ */ ) and other pass for removing single line comments ( // "" ) — spetzz, Apr 07 '14 at 15:08
Potential duplicate of http://stackoverflow.com/questions/3524317/regex-to-strip-line-comments-from-c-sharp/3524689#3524689 — kevchadders, Apr 07 '14 at 15:10

score 2 · Accepted Answer · edited May 23 '17 at 11:49

This is the perfect use case for the Aggregate LINQ operator because you're turning a list of strings (the result of splitting your input file into separate lines) into a single string, the input file without comment blocks. In general, reach for Aggregate when you want to reduce a list to a single value, or you want to carry state from one element of the sequence to the next (for example, a piece of state that'd be useful to carry with us is "are we in a comment block?" as a boolean).

In the query below, I made the simplifying assumption that begin and end comments will always be on their own line. If that is not the case, then the body of the Aggregate becomes more complex, but is essentially the same (you'd need to add code to handle splitting the line on either "/*" or "*/"). Here's a query that does what you need:

var inComment = false; // start off assuming we're not in a comment
// assume lines is some IEnumerable<string> representing the lines of your file,
// perhaps from a call to File.ReadAllLines(<file name>)
var result = 
    lines.Aggregate(new System.Text.StringBuilder(),
                    (builder, line) => {
                         if (!inComment)
                             // more code here if "/*" isn't on its own line
                             inComment = line.StartsWith("/*");

                         if (inComment)
                         {
                             // more code here if "*/" isn't on its own line
                             inComment &= !line.StartsWith("*/");
                             return builder;
                         }

                         if (!inComment) builder.AppendLine(line);

                         return builder;
                     }).ToString();

To simplify the example, I did not include the "are we in a comment block" state in the Aggregate method, and instead closed over the variable inComment. Closing over inComment could be removed by changing the type of the Aggregate to Tuple<Boolean StringBuilder> (instead of StringBuilder, as it is in the query above) and using Item1 instead of inComment and Item2 instead of builder.

Edit: I didn't explain the body of the Aggregate method, which might be valuable, especially since other commenters linked to SO questions using regular expressions. First off, you cannot remove all the comment blocks with a single regular expression, you'd have to use a regular expression as well as some additional logic; in the linked post, this additional logic was provided by the Regex.Replace method. This is a far more heavy weight solution than is required here. Instead, you want a simple state machine with two states: InComment and NotInComment. When you're in the InComment state, you check to see if the comment you're in is ending on the current line, and if so move to the NotInComment state. When you're in the NotInComment state, you check to see if a comment starts on the current line. If so, then you skip the line and move the InComment state. If not, you add that line to the output. The InComment state is represented by the if (inComment) block, the NotInComment state is everything else.

Remove comments from file

1 Answers1