6

I know this is incorrect. I just want to know how perl parses this.

So, I'm playing around with perl, what I wanted was perl -ne what I typed was perl -ie the behavior was kind of interesting, and I'd like to know what happened.

$ echo 1 | perl -ie'next unless /g/i'

So perl Aborted (core dumped) on that. Reading perl --help I see -i takes an extension for backups.

-i[extension]     edit <> files in place (makes backup if extension supplied)

For those that don't know -e is just eval. So I'm thinking one of three things could have happened either it was parsed as

  1. perl -i -e'next unless /g/i' i gets undef, the rest goes as argument to e
  2. perl -ie 'next unless /g/i' i gets the argument e, the rest is hanging like a file name
  3. perl -i"-e'next unless /g/i'" whole thing as an argument to i

When I run

$ echo 1 | perl -i -e'next unless /g/i'

The program doesn't abort. This leads me to believe that 'next unless /g/i' is not being parsed as a literal argument to -e. Unambiguously the above would be parsed that way and it has a different result.

So what is it? Well playing around with a little more, I got

$ echo 1 | perl -ie'foo bar'
Unrecognized switch: -bar  (-h will show valid options).

$ echo 1 | perl -ie'foo w w w'
... works fine guess it reads it as `perl -ie'foo' -w -w -w`

Playing around with the above, I try this...

$ echo 1 | perl -ie'foo e eval q[warn "bar"]'
bar at (eval 1) line 1.

Now I'm really confused.. So how is Perl parsing this? Lastly, it seems you can actually get a Perl eval command from within just -i. Does this have security implications?

$ perl -i'foo e eval "warn q[bar]" '
Evan Carroll
  • 78,363
  • 46
  • 261
  • 468
  • It would appear that Perl is gobbling up everything following the `-i` switch as its argument. – JRFerguson May 22 '12 at 20:12
  • @JRFerguson not to me, it looks like perl just assumes -i takes all characters up until the first whitespace, and then it seems like it runs its own form of command line parsing on the rest of the input. – Evan Carroll May 22 '12 at 20:28
  • Indeed that's what I'm saying. – JRFerguson May 22 '12 at 20:41
  • btw, no security implications. `perl -e'...'` cannot do anything that you couldn't do with `perl somefile.pl`. – ikegami May 22 '12 at 21:37
  • But, being able to set the argument to `perl -i` should not provide you with the same abilities as being able to set the argument to `perl -e` or `perl $file`. – Evan Carroll May 22 '12 at 21:40

2 Answers2

7

Quick answer

Shell quote-processing is collapsing and concatenating what it thinks is all one argument. Your invocation is equivalent to

$ perl '-ienext unless /g/i'

It aborts immediately because perl parses this argument as containing -u, which triggers a core dump where execution of your code would begin. This is an old feature that was once used for creating pseudo-executables, but it is vestigial in nature these days.

What appears to be a call to eval is the misparse of -e 'ss /g/i'.

First clue

B::Deparse can your friend, provided you happen to be running on a system without dump support.

$ echo 1 | perl -MO=Deparse,-p -ie'next unless /g/i'
dump is not supported.
BEGIN { $^I = "enext"; }
BEGIN { $/ = "\n"; $\ = "\n"; }
LINE: while (defined(($_ = <ARGV>))) {
    chomp($_);
    (('ss' / 'g') / 'i');
}

So why does unle disappear? If you’re running Linux, you may not have even gotten as far as I did. The output above is from Perl on Cygwin, and the error about dump being unsupported is a clue.

Next clue

Of note from the perlrun documentation:

-u

This switch causes Perl to dump core after compiling your program. You can then in theory take this core dump and turn it into an executable file by using the undump program (not supplied). This speeds startup at the expense of some disk space (which you can minimize by stripping the executable). (Still, a "hello world" executable comes out to about 200K on my machine.) If you want to execute a portion of your program before dumping, use the dump operator instead. Note: availability of undump is platform specific and may not be available for a specific port of Perl.

Working hypothesis and confirmation

Perl’s argument processing sees the entire chunk as a single cluster of options because it begins with a dash. The -i option consumes the next word (enext), as we can see in the implementation for -i processing.

case 'i':
    Safefree(PL_inplace);
    [Cygwin-specific code elided -geb]
    {
        const char * const start = ++s;
        while (*s && !isSPACE(*s))
            ++s;

        PL_inplace = savepvn(start, s - start);
    }
    if (*s) {
        ++s;
        if (*s == '-')      /* Additional switches on #! line. */
            s++;
    }
    return s;

For the backup file’s extension, the code above from perl.c consumes up to the first whitespace character or end-of-string, whichever is first. If characters remain, the first must be whitespace, then skip it, and if the next is a dash then skip it also. In Perl, you might write this logic as

if ($$s =~ s/i(\S+)(?:\s-)//) {
  my $extension = $1;
  return $extension;
}

Then, all of -u, -n, -l, and -e are valid Perl options, so argument processing eats them and leaves the nonsensical

ss /g/i

as the argument to -e, which perl parses as a series of divisions. But before execution can even begin, the archaic -u causes perl to dump core.

Unintended behavior

An even stranger bit is if you put two spaces between next and unless

$ perl -ie'next  unless /g/i'

the program attempts to run. Back in the main option-processing loop we see

case '*':
case ' ':
    while( *s == ' ' )
      ++s;
    if (s[0] == '-')        /* Additional switches on #! line. */
        return s+1;
    break;

The extra space terminates option parsing for that argument. Witness:

$ perl -ie'next  nonsense -garbage --foo' -e die
Died at -e line 1.

but without the extra space we see

$ perl -ie'next nonsense -garbage --foo' -e die
Unrecognized switch: -onsense -garbage --foo  (-h will show valid options).

With an extra space and dash, however,

$ perl -ie'next  -unless /g/i'
dump is not supported.

Design motivation

As the comments indicate, the logic is there for the sake of harsh shebang (#!) line constraints, which perl does its best to work around.

Interpreter scripts

An interpreter script is a text file that has execute permission enabled and whose first line is of the form:

#! interpreter [optional-arg]

The interpreter must be a valid pathname for an executable which is not itself a script. If the filename argument of execve specifies an interpreter script, then interpreter will be invoked with the following arguments:

interpreter [optional-arg] filename arg...

where arg... is the series of words pointed to by the argv argument of execve.

For portable use, optional-arg should either be absent, or be specified as a single word (i.e., it should not contain white space) …

Community
  • 1
  • 1
Greg Bacon
  • 134,834
  • 32
  • 188
  • 245
  • If the shell though thinks it is all one argument `'-ienext unless /g/i'` why does perl break it up on whitespace? Why would Perl assume that if `asdf jkl` was sent as one argument, it should be read as `-asdf` `-jkl`? Shouldn't the shell do the job preparing the arguments for perl, and perl just worry about handling the arguments as they sit? – Evan Carroll May 22 '12 at 20:35
  • Unfortunately, using `Deparse` on Mac OS X (10.6.8) running Perl 5.14.2 isn't revealing in this case. With or without it one simply gets "Abort trap". – JRFerguson May 22 '12 at 20:47
  • @Evan Carroll, Re "why does perl break it up on whitespace?" I suspect it has to do with Perl's handling of options on the #! line. Some OSes (e.g. Linux) pass everything after the command as a single argument. Splitting on spaces would allow `#!/usr/bin/perl -w -i` to work. – ikegami May 22 '12 at 21:16
  • 3
    @Greg Bacon, I wouldn't call Linux an ancient system. – ikegami May 22 '12 at 21:19
  • 1
    @ikegami interesting! I didn't know that the [shebang was executed so awkwardly](http://stackoverflow.com/a/4304187/124486) and lacks standardization. – Evan Carroll May 22 '12 at 21:32
  • Actually, I'm not sure how Linux does it. I could be wrong there. – ikegami May 22 '12 at 21:48
5

Three things to know:

  1. '-x y' means -xy to Perl (for some arbitrary options "x" and "y").

  2. -xy, as common for unix tools, is a "bundle" representing -x -y.

  3. -i, like -e absorbs the rest of the argument. Unlike -e, it considers a space to be the end of the argument (as per #1 above).

That means

-ie'next unless /g/i'

which is just a fancy way of writing

'-ienext unless /g/i'

unbundles to

-ienext -u -n -l '-ess /g/i'
  ^^^^^             ^^^^^^^
----------         ----------
val for -i         val for -e

perlrun documents -u as:

This switch causes Perl to dump core after compiling your program. You can then in theory take this core dump and turn it into an executable file by using the undump program (not supplied). This speeds startup at the expense of some disk space (which you can minimize by stripping the executable). (Still, a "hello world" executable comes out to about 200K on my machine.) If you want to execute a portion of your program before dumping, use the dump() operator instead. Note: availability of undump is platform specific and may not be available for a specific port of Perl.
ikegami
  • 367,544
  • 15
  • 269
  • 518
  • 1
    Note: I didn't know #3, and I had forgotten about #1, so this is indeed a very tricky question, especially since simply adding `-MO=Deparse` didn't help on this system. – ikegami May 22 '12 at 21:14
  • 1
    `'-x y' means -xy to Perl (for some arbitrary options "x" and "y").` is this true? `perl '-w edie'` doesn't work. maybe what you mean to say is that `'-x y' means -xy to Perl (for some arbitrary options "x" that requires an argument, and an arbitrary argument "y").`, as in the case of `perl '-e die'` – Evan Carroll May 23 '12 at 15:53
  • 1
    @Evan Carroll, ah, so it's specific to -i? odd! – ikegami May 23 '12 at 17:09