hello i am writing a java program to remove all comment from a string that conatins php source code can any one give me regular expression for php comment ?? please
3 Answers
Have a look at this link: http://ostermiller.org/findcomment.html
He arrives at this solution (for /* ... */
comments):
sourcecode.replaceAll("/\\*(?:.|[\\n\\r])*?\\*/","");
For // ...
and # ...
comments you should be able to do something like
sourcecode.replaceAll("(//|#).*?[\\n\\r]");
Beware of the following type of situations though:
someString = "An example comment: /* example */";
someString = "An example comment: // example";
someString = "An example comment: # example";

- 413,195
- 112
- 811
- 826
-
This will trim neither pound nor double slash comments – Explosion Pills May 05 '11 at 13:43
-
Right. It is for `/* ... */` comments. Updated answer. – aioobe May 05 '11 at 13:44
-
PHP allows you to start a comment with the # sign. – Explosion Pills May 05 '11 at 13:47
-
You mention comments inside strings. Those might be legitimately, and significantly, used to compose javascript containing conditional compilation directives inside comments: `$s = "";` – Mike Samuel May 05 '11 at 13:56
-
Also watch out for ``, as in the example listed at http://php.net/manual/en/language.basic-syntax.comments.php – BoffinBrain May 05 '11 at 14:06
This will be extremely tricky!
For a start, you have three types of comment in PHP: /* ... */
, and also //
and #
.
But you need to exclude those which are part of a string, especially as //
can appear quite often in strings, as an escaped slash character, and a #
character inside a string could be perfectly legitimate part of the text.
To compound this problem, strings can be multi-line, and in addition to single and double-quotes, they can also be written using Heredoc and Nowdoc syntax (see http://php.net/manual/en/language.types.string.php), which may be particularly tricky to pick out accurately with regex. Plus of course, you need to be sure you're within the <?php ... ?>
tags.
It can probably be done, but to be honest I'd say that with all of that to deal with, you'd be far better off using a language parser than regex to try to do this.

- 166,037
- 39
- 233
- 307
Like Spudley said, you cannot simply write a regex to do this. There are too many exceptional cases, like comment-like strings inside strings, and line comments terminated early by closing PHP tags. In order to guarantee correctness, you could have to write an entire language parser.
However, if you're willing to use PHP itself to do the filtering for you, this question has all the answers, and it should be significantly easier and more robust. If you have PHP installed on the same machine as the Java application, you can run PHP using Runtime.exec()
and getting the console output, or have PHP export to a file and import it later into your program.

- 1
- 1

- 6,337
- 6
- 33
- 59