1

I need a regex to strip single line comments from a string, but leaves URLs untouched. Code should be working with something like this:

//Some Comment on http://bobobo.com where bla < 5
<script type="text/javascript" src="http://bububu.com"></script>
<script type='text/javascript' src='http://bababa.com'></script>

EDIT: of course I do not use that kind of comment in the HTML file. Correct example would be

<script type="text/javascript">
   //Some Comment on http://bobobo.com where bla < 5
</script>
<script type="text/javascript" src="http://bububu.com"></script>
<script type='text/javascript' src='http://bababa.com'></script>

My bad, sorry for the mislead.

A possible solution should find "//Some Comment on http://bobobo.com where bla < 5", but not "//bububu.com">" and "//bababa.com'>".

Thanks for any hint...

Peavey
  • 302
  • 2
  • 11
  • 5
    Do **NOT** use regex to handle html: http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags – Marc B Nov 28 '11 at 16:58
  • @Marc B: Stripping out single line comments would be okay. – Tomalak Nov 28 '11 at 17:01
  • @Tomalak: maybe, but then you're stucking try to figure out if a particular comment is actually a comment, or a url buried in an attribute or is actually a plaintext url. – Marc B Nov 28 '11 at 17:02
  • @Marc B: That's right. Under the assumption that lines either are comments or not (as the sample suggests), regex would indeed work. – Tomalak Nov 28 '11 at 17:03

5 Answers5

1

The short answer is: don't. The reason is that single-line comments are not valid comments in HTML. They're just text tokens. You shouldn't have them in your code. Eliminate them before they are inserted into your source.


I tried to give you an alternative answer using PHP's DomDocument and DomXPath, but it only supports XPath 1.0, and the replace function doesn't exist until 2.0. I'm not familiar enough with XPath 1.0 to be able to replace a string in the DOM. Here's what you would need to do though:

  1. Select all the text nodes (will ignore attributes because they aren't text nodes)
  2. Replace \s*//.* (dot does not match a newline) with ''.
  3. Insert the text back into the node.
Levi Morrison
  • 19,116
  • 7
  • 65
  • 85
  • \s*//.* does not match if the comment is in the first line of the file. – Kaii Nov 28 '11 at 22:33
  • @Kaii I may be wrong, but I believe that is an implementation detail that may not matter. You are using replace mechanisms which generally will take that into consideration. – Levi Morrison Nov 28 '11 at 22:52
1

Thanks everyone, but finally

preg_match('!//.*?\n!', $data, $matches); 

seems to do the trick with or without spaces, tabs or new lines before the comment.

Peavey
  • 302
  • 2
  • 11
0

The regex is ^//.

In preg_replace(), you would use the string '!^//!', for example. The ! is used as a regex delimiter to avoid leaning toothpick syndrome ('/^\/\//').

If your lines can start with spaces, you could use ^\s*//.

Tomalak
  • 332,285
  • 67
  • 532
  • 628
  • 1
    What about comments that are indented by some amount? – Marc B Nov 28 '11 at 17:03
  • @Marc B: See third paragraph. If things get more complicated than that, very clear rules must be defined (or regex is not an option). – Tomalak Nov 28 '11 at 17:05
0

You could also use this to strip comments that don't appear on a line by itself

/(?!http:)\/\//
Cfreak
  • 19,191
  • 6
  • 49
  • 60
0
preg_replace( '~^\h?//(^$)~m', '', $html );

Replace // until the end of the line with '', with optional horizontal whitespace before it. Not tested, but something like that should work.

Berry Langerak
  • 18,561
  • 4
  • 45
  • 58