0

I have a bunch of files and I would like to send them to deployment sans any comments but with whitespace intact (so that I can make any quick changes in production in emergency cases).

The comments can be either single line comments (#, //) or multi line syntax /**/ and at any indentation level.

I want to create a batch file that when executed from any directory reads all php files and strips their comments.

I am not even sure what to try. I know I can fetch all the files with .php extension easily and loop through them. Replacing their content is easy enough as well. What I am stuck on is how to remove the comments.

Achshar
  • 5,153
  • 8
  • 40
  • 70
  • It may be possible to do this with batch file but it is better to choose another way (programming language). Batch files are not for such tasks - usually they call external tool for the real work. – i486 Jan 28 '16 at 13:08
  • So what would you suggest I do? Like what programming language would have such a function. – Achshar Jan 28 '16 at 13:10
  • Python and Perl are freely available and support regexes. If you want a specific tool, then there are a number of Windows ports of UNIX tools such as `awk` and `sed`. For a full tool set you might get `Cygwin`. – lit Jan 28 '16 at 13:53
  • http://stackoverflow.com/questions/643113/regex-to-strip-comments-and-multi-line-comments-and-empty-lines – Squashman Jan 28 '16 at 13:53
  • @Squashman Those answers don't deal with cases where `//` or `#` comes inside a pair of quotes. The tokenizer looks interesting, I'll have to look into it. – Achshar Jan 28 '16 at 13:58
  • @Liturgist Regex can do it but it will not work for some cases. I was more hoping there would be a function that some language had. – Achshar Jan 28 '16 at 13:59
  • @Achshar well I am sure you know that SO is not a code writing service. I at least searched the website to find something close for you. You could at least attempt to write something in some language. I can tell you that writing a batch file would not even be in my top 5 solutions for this. Batch files cannot do Regular Expression string replacement. You should re-tag your question with something other than batch-file and cmd. – Squashman Jan 28 '16 at 14:06
  • @Squashman done. And yeah I spent an hour googling this, and read all the other answers on SO. None of them fit by requirement and I am not knowledgeable enough to write my own regex yet, which is what brings me to SO. – Achshar Jan 28 '16 at 14:15
  • I know it is old, but will this work? http://stackoverflow.com/questions/503871/best-way-to-automatically-remove-comments-from-php-code – lit Jan 28 '16 at 14:28
  • Yeah tokenizer seems to be the only option right now. If nothing else comes up that's what I'll go with. – Achshar Jan 28 '16 at 20:19

1 Answers1

0

There's multiple ways to do it, but I think the most efficient will be to use regular expressions. I'm not a RegExp guru, but I know that you can use grep (or any powerfull text editor such as Notepad++, sublimetext, et...) to replace expressions highlighted to empty string.

For example, in Sublime Text, I've tested this regular expression, which find any multiple lines comment :

\/\*([\s\S]*?)\*/|//.*|#.*

Quick Explanation :

  • the regexp is made of 3, with an OR symbol inside ( the pipe symbol |)
  • the first expression mean "something starting with /*, any character, or any blank character , and finishing with */"
  • the second means "any expression starting with // and any character following
  • the third means "any expression starting with # and any character following

Once you've set this search expression in sublime text, you can replace what it did return with a "blank" replacement.

  • Will this deal with comments that are in the middle of the line? or if `//` or `#` appears inside a double or single quote? That would mess with it. – Achshar Jan 28 '16 at 13:12
  • Yes, you're right, the comments that are inside quotes will be removed ! For the comments in middle of line (/* */ style), they should be removed OK ! – Ludovic Lemarinel Jan 28 '16 at 19:59