I want to remove (Java/C/C++/..) multiline comments from a file. For this, I have written a regular expression:
/\*[^\*]*(\*+[^\*/][^\*]*)*\*+/
This regular expression works well with Nodepad++ and Geany (search and replace all with nothing). The regex behaves differently in VB.NET.
I am using:
Microsoft Visual Studio 2010 (Version 10.0.40219.1 SP1Rel)
Microsoft .NET Framework (4.7.02053 SP1Rel)
The file I'm running replacements on is not that complex. I do not need to take care of any quoted text that might start or end comments.
@sln thank you for your detailed reply, I'll also quickly explain my regex as nicely as you did!
/\* Find the beginning of the comment.
[^\*]* Match any chars, but not an asterisk.
We need to deal with finding an asterisk now:
(\*+[^\*/][^\*]*)* This regex breaks down to:
\*+ Consume asterisk(s).
[^\*/] Match any other char that is not an asterisk or a / (would end the comment!).
[^\*]* Match any other chars that are not asterisks.
( )* Try to find more asterisks followed by other chars.
\*+/ Match 1 to n asterisks and finish the comment with /.
Here are two code snippets:
First:
text
/*
* block comment
*
*/ /* comment1 */ /* comment2 */
My text to keep.
/* more comments */
more text
Second:
text
/*
* block comment
*
*/ /* comment1 *//* comment2 */
My text to keep.
/* more comments */
more text
The only difference is the space between
/* comment1 *//* comment2 */
Deleting found matches with Notepad++ and Geany works perfectly for both cases. Using regular expressions from VB.NET fails for the second example. The result for the second example after deletion looks like this:
text
more text
But it should look like this:
text
My text to keep.
more text
I am using System.Text.RegularExpressions:
Dim content As String = IO.File.ReadAllText(file_path_)
Dim multiline_comment_remover As Regex = New Regex("/\*[^\*]*(\*+[^\*/][^\*]*)*\*+/")
content = multiline_comment_remover.Replace(content, "")
I would like to have the same results with VB.NET as with Notepad++ and Geany. As answered by sln, my regex "should work in a weird way". The question is why does VB.NET fail to process this regex as intended? This question is still open.
Since sln's answer got my code working, I'll accept this answer. Although this doesn't explain why VB.NET doesn't like my regex. Thanks for all your help! I learned a lot!