Approximately syntax checking Perl code, faster than perl -c

Question

Is there a way to syntax check a Perl program without running perl? The well-known answer is 'no'. Without starting the full perl runtime to evaluate code for imports, etc, you cannot tell whether program syntax is correct.

But what about if you wanted an approximate answer? A syntax checker that will say either 'bad' or 'maybe'. If 'bad' then the program is definitely not valid perl code (assuming a vanilla perl interpreter). If 'maybe' then it looks OK but only perl itself will be able to say for sure.

A program which always prints 'maybe' is clearly such a checker, but not a very useful one. A better attempt is to use PPI. There may be some valid Perl programs which are rejected by PPI, but if that occurs it is accepted as a PPI bug (I think).

Digression: Why is this useful? One use might be a kwalitee check. To catch various "d'oh" moments the version control system at $WORK runs all Perl code through perl -c before allowing the commit. (I am not recommending this as a general practice, just noting that it has been useful at our site.) But perl -c is unsafe, since it executes code (as it must). Using a conservative syntax checker instead would be safer, at the expense of some cases where the checker says 'maybe' but in fact the program is not valid Perl.

What I really want (end of digression): But in fact safety is not the motivating factor for my current application. I am interested in speed. Is there a way to roughly check and reject badly-formed Perl code before going to the expense of spinning up a whole Perl interpreter? PPI is slower than perl itself, so not a good candidate. You could write an approximate Perl grammar and use a parser generator to build a simple C program which accepts or rejects pseudo-Perl.

My application is 'delta debugging'. You start with a large program which has a certain property (such as segfaulting, for example), and knock out sections of it while preserving that property. I use http://delta.tigris.org/ which works in a simple-minded line-oriented way. Many of the test cases it generates will not be valid Perl code. The delta debugging would go faster if these could be eliminated quickly before the full perl executable is started.

Since the overhead of starting the perl interpreter is probably the biggest part of the time taken, you could implement some kind of server which listens on a socket, accepts program text, and returns 'bad' or 'maybe' by attempting to eval() the text or run it through PPI.

Another way to speed things up is to make perl fail faster. Normally it prints all the syntax errors and diagnostics it can find. If it would instead stop on the first one some time would be saved.

But I do like the idea of a grammar for almost-Perl which could be checked by a simple C program. Does such a thing exist?

(Related: Perl shallow syntax check? ie. do not check syntax of imports but my question is more about speed, and I am happy to have a rough check which will accept some invalid programs, as long as it does not reject valid ones.)

Re "There may be some valid Perl programs which are rejected by PPI, but if that occurs it is accepted as a PPI bug (I think).", Unless it's because PPI doesn't execute code at run-time. You make it sound like the PPI's limitation is fixable, but it's not. — ikegami, May 08 '14 at 13:29
Re "Is there a way to roughly check and reject badly-formed Perl code before going to the expense of spinning up a whole Perl interpreter?" Sounds like a premature optimization. What makes you think that `perl -c` would be slower? I would think the opposite! — ikegami, May 08 '14 at 13:30
“*Why is this useful? One use might be a kwalitee check before allowing the commit*” – you are looking for perlcritic, which will tell you “this code seems nice” or “you've written mediocre crap”. However, that won't help you with weeding out invalid code. — amon, May 08 '14 at 14:29
This is certainly the most interesting Perl question I've seen all week, if not longer. A shame I don't have a good answer for it — LeoNerd, May 08 '14 at 14:52
ikegami is right, if the perl code depends on compile-time code execution in unusual ways (more than just the usual importing of symbols) then PPI will never be able to understand it. I suppose in principle a modified PPI could produce 'maybe' and stop as soon as it sees BEGIN blocks or other things it doesn't understand. As for whether an alternative would be faster than perl -c, I am guessing that a pseudo-Perl grammar compiled into a C program would be quite a bit faster; I agree that PPI would be slower. amon: for commit checking I just want a basic sanity check, not full perlcritic. — Ed Avis, May 08 '14 at 15:15

tobyink · Answer 1 · 2014-05-08T14:38:19.497

Given source filters, prototypes, and the Perl (5.14+) keyword API, imports can radically alter what syntax is valid and what is not. If you import anything, then such a check would be of very little use.

If you import nothing, then you can probably safely load all your external modules with require instead of use, and perl -c will become lightning fast (because require is processed at runtime).

PPI is not especially useful here because it takes a very forgiving best guess approach at parsing, so will accept very invalid inputs without complaints:

#!perl
use strict;
use warnings;
use PPI::Document;
use PPI::Dumper;

PPI::Dumper->new(
   PPI::Document->new(\"foo q[}")
)->print;

Perl::Lexer might be possibly more helpful, though it will only detect errors so broken that they can't even be tokenized. My previous example happens to be one of those, so this does complain:

#!perl
use strict;
use warnings;
use Perl::Lexer;

print $_->inspect, $/
   for @{ Perl::Lexer->new->scan_string("foo q[}") };

Even so, things like the Perl keyword API, Devel::Declare, and source filters are applied prior to lexing, so if you import any modules that take advantage of these techniques, Perl::Lexer will be stuck. (Any of these techniques can easily make foo q[} valid syntax.)

Compiler::Lexer and Compiler::Parser may be of some use. The following dumps core:

#!perl
use strict;
use warnings;
use Compiler::Lexer;
use Compiler::Parser;

my $t = Compiler::Lexer->new("eg.pl")->tokenize("foo q[}");
my $a = Compiler::Parser->new->parse($t);

While if you correct the mismatched quotes in foo q[} to foo q[], it no longer dumps core. That seems to be a result. ;-)

Ultimately, the answer depends on what sort of code you're writing and what class of errors you're hoping to spot. perl -c will give you a fairly rigorous syntax check. Perl::Lexer may be faster but there are big classes of errors it won't spot. Compiler::Lexer/Compiler::Parser might be useful in the future but seems to behave erratically right now.

Personally, I'd stick with perl -c, and if it's too slow try to cut down the number of modules you load at compile-time, in favour of run-time loading.

TL;DR: if you want static analysis, don't use Perl.

Thanks. Note that I am not wanting to do static analysis but to have a faster way to do delta debugging, where bits of code are removed at random in the hope of producing a minimal test case. So the input program is fixed - rewriting it to load modules at run time, etc, would negate any time saving from the delta debugging going faster. My requirement is something quick and dirty, as a first pass before running perl itself. So 'perl -c' is pointless since I may as well just go straight to running the generated test case. — Ed Avis, May 08 '14 at 15:21

ysth · Answer 2 · 2014-05-08T16:27:57.297

2

If all you want is a quick compilability check, have a perl process that stays running to check each file for you:

perl -MFile::Slurp -lne'print 0+!! eval "sub {" . read_file($_) . "}"'

edited May 08 '14 at 16:27

answered May 08 '14 at 14:28

ysth

96,171
6
121
214

Approximately syntax checking Perl code, faster than perl -c

2 Answers2