This is the perfect use case for the Aggregate
LINQ operator because you're turning a list of strings (the result of splitting your input file into separate lines) into a single string, the input file without comment blocks. In general, reach for Aggregate
when you want to reduce a list to a single value, or you want to carry state from one element of the sequence to the next (for example, a piece of state that'd be useful to carry with us is "are we in a comment block?" as a boolean).
In the query below, I made the simplifying assumption that begin and end comments will always be on their own line. If that is not the case, then the body of the Aggregate
becomes more complex, but is essentially the same (you'd need to add code to handle splitting the line on either "/*" or "*/"). Here's a query that does what you need:
var inComment = false; // start off assuming we're not in a comment
// assume lines is some IEnumerable<string> representing the lines of your file,
// perhaps from a call to File.ReadAllLines(<file name>)
var result =
lines.Aggregate(new System.Text.StringBuilder(),
(builder, line) => {
if (!inComment)
// more code here if "/*" isn't on its own line
inComment = line.StartsWith("/*");
if (inComment)
{
// more code here if "*/" isn't on its own line
inComment &= !line.StartsWith("*/");
return builder;
}
if (!inComment) builder.AppendLine(line);
return builder;
}).ToString();
To simplify the example, I did not include the "are we in a comment block" state in the Aggregate
method, and instead closed over the variable inComment
. Closing over inComment
could be removed by changing the type of the Aggregate
to Tuple<Boolean StringBuilder>
(instead of StringBuilder
, as it is in the query above) and using Item1
instead of inComment
and Item2
instead of builder
.
Edit: I didn't explain the body of the Aggregate
method, which might be valuable, especially since other commenters linked to SO questions using regular expressions. First off, you cannot remove all the comment blocks with a single regular expression, you'd have to use a regular expression as well as some additional logic; in the linked post, this additional logic was provided by the Regex.Replace
method. This is a far more heavy weight solution than is required here. Instead, you want a simple state machine with two states: InComment and NotInComment. When you're in the InComment state, you check to see if the comment you're in is ending on the current line, and if so move to the NotInComment state. When you're in the NotInComment state, you check to see if a comment starts on the current line. If so, then you skip the line and move the InComment state. If not, you add that line to the output. The InComment state is represented by the if (inComment)
block, the NotInComment state is everything else.