0

I have a Perl script that strips comments from other Perl scripts:

open (INFILE, $file);
@data = <INFILE>;

foreach $data (@data)
{
    $data =~ s/#.*/ /g;
    print "$data";
}

The problem is, this code also removes the shebang line:

#!/usr/bin/perl

How can I strip comments except for the shebang?

ThisSuitIsBlackNot
  • 23,492
  • 9
  • 63
  • 110
tidibur
  • 11
  • 2
  • You could simply create a variable called `$skip` with a value of `1`. In the first time the code enters the loop, if `$skip` is equal to one, then change the value to 0 and `continue`. – Jerry Mar 26 '14 at 15:23
  • 5
    Your code will also strip code like `$#array`, which is not a comment. – ThisSuitIsBlackNot Mar 26 '14 at 15:26
  • thank you sir but i would really like to skip the first occurrence and not the first line.. – tidibur Mar 26 '14 at 16:15
  • 1
    @tidibur The shebang cannot be on any other line than the first line. – TLP Mar 26 '14 at 16:58
  • So how did it go? Did you tell your teacher that he is a noob? :) – TLP Mar 29 '14 at 20:16
  • @TLP, since no one got the problem, our professor gave us a very simple problem which was included in the lecture. he just wanted to see if we learned anything during the duration of the course. he just made us make a program to count the number of lines from a given text file. but in return only gave us a passing grade of 75%. props to everyone who helped. o think ill pass. :) – tidibur Mar 30 '14 at 16:40
  • @tidibur That is a trivial problem in comparison. :) `perl -ne'END { print $. }' file.txt` in a one-liner. So he never admitted to giving you an impossible problem? Not even give a solution to show a "correct" example? That's weak.. :) – TLP Mar 30 '14 at 17:32

4 Answers4

13

Writing code to strip comments is not trivial, since the # character can be used in other contexts than just comments. Use perltidy instead:

perltidy --delete-block-comments --delete-side-comments foo

will strip # comments (but not POD) from file foo and write the output to foo.tdy. The shebang is not stripped.

ThisSuitIsBlackNot
  • 23,492
  • 9
  • 63
  • 110
  • 5
    @tidibur And I would like a pony. Regexes are not up to the task, and you're wasting your time to try. You can't even handle trivial cases reliably with regexes, due to string quoting and so forth. – Vector Gorgoth Mar 26 '14 at 16:18
  • @tidibur Then you will have to write a regex that accounts for all the different contexts in which `#` can be used: block comments (`# foo`), side comments (`my $foo; # foo`), strings (`my $foo = '#foo';`, `my $foo = q/#foo/;`, etc.), here-docs, arrays (`my $last = $#array;`), and so on. Don't reinvent the wheel (poorly). – ThisSuitIsBlackNot Mar 26 '14 at 16:19
  • @ThisSuitIsBlackNot i tried using this to skip the first occurrence: `foreach $data(@data) { $data =~ s/#(?!\!\/)//g; print "$data"; }` although i was able to save the path from being removed, the following occurrences only removed the "#" instead of the whole comment. – tidibur Mar 26 '14 at 16:27
  • @tidibur That will badly mangle your source code, since you only remove the `#` and leave the comment text. Don't use a regex. This is a very complex problem, but it has already been solved. – ThisSuitIsBlackNot Mar 26 '14 at 16:33
  • @ThisSuitIsBlackNot yes it is very complex. i hope to be able to find the solution or fail. :( – tidibur Mar 26 '14 at 16:40
  • @tidibur I gave you a solution: perltidy. – ThisSuitIsBlackNot Mar 26 '14 at 16:41
  • @ThisSuitIsBlackNot i am only supposed to use regex. if i use perltidy, it would be void. would placing the occurrence of `#!/usr/bin/perl` in a variable and then `#comments` in another variable work? – tidibur Mar 26 '14 at 16:46
  • @tidibur Is this a homework assignment? If so, ask your teacher if you're expected to handle the `#` inside strings, `__DATA__` blocks, POD, here-docs, arrays, etc., since it would be impossible to handle all of those cases with a regex. – ThisSuitIsBlackNot Mar 26 '14 at 16:53
  • @ThisSuitIsBlackNot its a practical exam with given problems and then randomly drafted. hope this doesnt show up. we are only allowed to use regexes. – tidibur Mar 26 '14 at 17:00
  • 2
    @tidibur That is a useless exam, created by a teacher with no knowledge in Perl. The problem is extremely complex, filled with edge cases. Even a trivial regex like yours will be extremely destructive on Perl code. You would have to restrict the style of the comment, for example `/^# /` (first on line, followed by space). But even this could fail in countless ways, such as for multiline strings or regexes. Tell your teacher that and he will be impressed. Or insulted, but hey, yolo. – TLP Mar 26 '14 at 17:06
  • @TLP hahaha! thanks but wouldnt do that if not drunk enough. :) – tidibur Mar 26 '14 at 17:12
  • Interesting note about `PPI` is that `PPI::Token::Comment` matches the shebang line – Zaid Mar 26 '14 at 17:32
  • 1
    @tidibur, What ThisSuitIsBlackNot means by "complex" is "pages and and pages long". It would takes weeks for an expert to write. I agree with TLP. Whoever wrote that exam's made a mistake. I can you find it unacceptable to fail to handle shebang lines, but find it acceptable to fail to handle `$#array`, `"#"`, `s#/#!#`, `#line 1000`, etc. YOUR PROGRAM WOULD FAIL TO WORK ON YOUR OWN PROGRAM! – ikegami Mar 26 '14 at 18:20
3

There is a method PPR::decomment() that can be used:

use strict;
use warnings;
use PPR;

my $document = <<'EOF';
print "\n###################################\n";
print '\n###################################\n';
print '\nFollowed by comment \n'; # The comment
return $function && $function !~ /^[\s{}#]/;
EOF

my $res = PPR::decomment( $document );
print $res;

Output:

print "\n###################################\n";
print '\n###################################\n';
print '\nFollowed by comment \n'; 
return $function && $function !~ /^[\s{}#]/;
Håkon Hægland
  • 39,012
  • 21
  • 81
  • 174
  • Looks good though how would I use it when I have the string in a variable while looping through a file? – albert Apr 17 '20 at 15:38
  • You could try running `PPR::decomment()` on the variable? – Håkon Hægland Apr 17 '20 at 15:40
  • Sorry to nag (as non perl programmer) but how would I do this say the string is in `$cleanline`? – albert Apr 17 '20 at 15:42
  • No problem. Have you tried `PPR::docomment( $cleanline )` ? – Håkon Hægland Apr 17 '20 at 15:43
  • 1
    I just did send my nagging comment and than I saw the `$res` line so I just wanted to write the comment: Oh stupid me, just `my $res = PPR::decomment( $cleanline);`. testing will take some time, I probably have to install PPR – albert Apr 17 '20 at 15:45
  • I just encountered a small problem, the behavior is according to the PPR documentation but in my case not really wanted. According to the PPR documentation: "The subroutine will fail if the argument wasn't valid Perl code, in which case it returns undef and sets $PPR::ERROR to indicate where the invalid source code was encountered." and I run into this problem when I have a line like: `sub fie { # comment`, any brilliant ideas? – albert Apr 18 '20 at 11:09
  • I just formulated a follow up question in: https://stackoverflow.com/questions/61289457/strip-all-comment-from-perl-file-using-perl-ppr – albert Apr 18 '20 at 13:00
2

perltidy is the method to do this if it's anything but an exercise. There's also PPI for parsing perl. Could use the PPI::Token::Comment token to do something more complicated than just stripping.

However, to answer your direct question, don't try to solve everything in a single regex. Instead, break up your problems into logic pieces of information and logic. In this instead, if you want to skip the first line, do so by using line by line processing which conveniently sets the current line number in $.

use strict;
use warnings;
use autodie;

my $file = '... your file...';

open my $fh, '<', $file;

while (<$fh>) {
    if ($. != 1) {
        s/#.*//;
    }

    print;
}

Disclaimer

The approach of using regex's for this problem is definitely flawed as everyone has already said. However, I'm going to give your instructor the benefit of the doubt, and that she/he is aiming to teach by intentionally giving you a problem that is outside of the perview of regex's ability. Good look finding all of those edge cases and figuring out how to do with them.

Whatever you do, don't try to solve them using a single regex. Break your problem up and use lots of if's and elsif's

Miller
  • 34,962
  • 4
  • 39
  • 60
  • 1
    Aside from the obvious problems of breaking `$#array`, `"#"`, `s#/#!#g` and `#line 1000`, it doesn't remove comments (if any) on first line. – ikegami Mar 26 '14 at 18:07
  • 1
    I should have said "Aside from the obvious problem that the program does not work on itself"! – ikegami Mar 26 '14 at 18:15
  • @Miller short but clean code. very well done but this wont work if the shebang is placed on the 2nd line. although its pure common sense that it should always be first. ill give this a shot. thank you – tidibur Mar 26 '14 at 18:21
  • 1
    @tidbur, You are mistaken. There cannot be such a thing as a shebang on the second line. By definition, the shebang (`#!`) must be the very first two characters of the file. – ikegami Mar 26 '14 at 18:29
  • My solution isn't meant to be a "solution". It's meant to suggest an approach to work on this, but you'll definitely need to add more logic. Anyway, walking away from this, as I just wanted to give a nudge in a direction your instructor was obviously intending. – Miller Mar 26 '14 at 18:44
  • thanks to the both of you. will sure try to use both of your tips. :( – tidibur Mar 26 '14 at 18:49
2

Since you asked for a regex solution:

'' =~ /(?{
   system("perltidy", "--delete-block-comments", "--delete-side-comments", $file);
   die "Can't launch perltidy: $!\n"                   if $? == -1;
   die "perltidy killed by signal ".( $? & 0x7F )."\n" if $? & 0x7F;
   die "perltidy exited with error ".( $? >> 8 )."\n"  if $? >> 8;
});

It seems like you are leaning towards using the following:

#!/usr/bin/perl
while (<>) {
   if ($. != 1) {
      s/#.*//;
   }
   print;
}

But it doesn't work on itself:

$ chmod u+x stripper.pl

$ stripper.pl stripper.pl >stripped_stripper.pl

$ chmod u+x stripped_stripper.pl

$ stripped_stripper.pl stripper.pl
Substitution pattern not terminated at ./stripped_stripper.pl line 4.

$ cat stripped_stripper.pl
#!/usr/bin/perl
while (<>) {
   if ($. != 1) {
      s/
   }
   print;
}

It also fails to remove comments on the first line:

$ cat >first.pl
# This is my first Perl program!
print "Hello, World!\n";

$ stripper.pl first.pl
# This is my first Perl program!
print "Hello, World!\n";
ikegami
  • 367,544
  • 15
  • 269
  • 518
  • thank you for the help but this way too complex for me to comprehend. i have only been learning perl for about a week. – tidibur Mar 26 '14 at 18:23
  • 1
    It does exactly the same as if the first line and last line weren't there: It runs perltidy, and throws and reports any errors. You insisted it had to be in a regex, so there you go. If you can't understand that, how could you possibly write a Perl parser? – ikegami Mar 26 '14 at 18:25
  • yep. sorry for that. i hope @miller's solution would suffice though. and thank you for your effort. :) – tidibur Mar 26 '14 at 18:27
  • 2
    @Miller's solution doesn't work on Miller's solution. – ikegami Mar 26 '14 at 18:27
  • I tried it and it worked. but only if the "shebang" is in the first line.. hmmm – tidibur Mar 26 '14 at 18:29
  • @tidbur, Miller's solution doesn't have a shebang line. I was referring to the fact that it clobbers line 11. – ikegami Mar 26 '14 at 18:31
  • thats because he used other functions. try this one: `#!/usr/bin/perl $file = $ARGV[0]; open F, $file; while () { if ($. != 1) { s/#.*//; } print; }` that is the one i used – tidibur Mar 26 '14 at 18:35
  • @tidibur, That solution does not work on itself. (And you should just use `while (<>)`, which will read from the file passed as arg or from STDIN if none) – ikegami Mar 26 '14 at 18:36
  • dude im telling u its working. tried it several times already. do try the solution i posted. – tidibur Mar 26 '14 at 18:40
  • I've updated my answer to show that it's not working – ikegami Mar 26 '14 at 18:40
  • now i understand what you're saying. your right man. I thought i got it right already. :(( thanks for showing that dude – tidibur Mar 26 '14 at 18:44