4

I'm trying to do remove JavaScript comments via a regular expression in C# and have become stuck. I want to remove any occurrences of double slash // style comments.

My current regex is (?<!:)//[^\r\n]* which will catch all comments and prevent matching of http://. However, the negative lookbehind was lazy and of course bit me back in the following test case:

var XSLPath = "//" + Node;

So I'm looking for a regular expression that will perform a lookbehind to see if an even number of double quotes (") occurs before the match. I'm not sure if this is possible. Or is there maybe a better way to do this?

Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
Gavin Miller
  • 43,168
  • 21
  • 122
  • 188

1 Answers1

3

(Updated based on comments)

It looks like this works pretty well:

(?<=".*".*)//.*$|(?<!".*)//.*$

It appears that the test cases in Regex Hero show that it'll match comments the way I think it should (almost).

For instance, it'll completely ignore this line:

var XSLPath = "//" + Node;

But it's smart enough to match the comment at the end of this line:

var XSLPath = "//"; // stuff to remove

However, it's not smart enough to know how to deal with 3 or more quotation marks before the comment. I'm not entirely sure how to solve that problem without hard-coding it. You need some way to allow an even number of quotes.

Steve Wortham
  • 21,740
  • 5
  • 68
  • 90