25

I have been using the Perl command line with a -ne option for years, largely to process text files in ways that sed can't. Example:

cat in.txt | perl -ne "s/abc/def/; s/fgh/hij/; print;" > out.txt

I have no idea where I learned this, and have only today read perlrun and found there are other forms (perl -pe for example).

What else should I know about perl -ne?

brian d foy
  • 129,424
  • 31
  • 207
  • 592
Anon Gordon
  • 2,469
  • 4
  • 28
  • 34
  • 7
    `perl -pe` is appropriate for your example code. Use `-pe` and drop the `print` statement – mob Feb 06 '10 at 07:28
  • Gee, I learned about it by reading the output of `perl --help`. I tend to use that option on all programs I use, just to know what's available. – Rob Kennedy Feb 06 '10 at 07:30
  • 10
    Well, it's all in perlrun really. Every Perl hacker should read the documentation. :) – brian d foy Feb 06 '10 at 07:30
  • 1
    You can also skip the "cat in.txt |" part, and just put in.txt at the end of the command line, as the "wokka" operator will pick up files named at the end of the cmd line, *or* stdin. – Roboprog Feb 06 '10 at 08:12
  • 4
    Perl hackers (and non-Perl hackers as well) should know about UUOC (http://sial.org/howto/shell/useless-cat/) ... – mjy Feb 06 '10 at 13:36
  • 1
    I should have said this was a really simplified example, I wouldn't normally just cat something like that, a perl -ne is more likely to occur in the middle of a bunch of pipelined commands of various types. Thanks for your comments though. – Anon Gordon Feb 08 '10 at 06:44

7 Answers7

28

perl -ne 'CODE' is equivalent to the program

while (<>) {
    CODE
}

perl -ane 'CODE' and perl -F/PATTERN/ -ane are also good idioms to know about. They are equivalent to

while (<>) {
    @F = split /\s+/, $_;
    CODE
}

and

while (<>) {
    @F = split /PATTERN/, $_;
    CODE
}

Example: advanced grep:

perl -ne 'print if/REGEX1/&&!/REGEX2/&&(/REGEX3/||/REGEX4/&&!/REGEX5/)' input

perl -F/,/ -ane 'print if $F[2]==4&&$F[3]ge"2009-07-01"&&$F[3]lt"2009-08-01"' file.csv


A particularly clever example that uses mismatched braces is here.

Community
  • 1
  • 1
mob
  • 117,087
  • 18
  • 149
  • 283
16

There is one important thing to know about perl -ne and perl -pe scripts: they implicitly use <>.

"Why is that important?" you might ask.

The magic <> operator uses the 2 arg form of open. If you recall, 2 arg open includes the specification of mode with the filename in one argument. An old style call to open FILE, $foo is vulnerable to manipulation of the file mode. A particularly interesting mode in this context is |--you open a handle to a pipe to a process you execute.

You might be thinking "Big deal!", but it is.

  • Imagine a cron job executed by root to munge log files in some directory.
  • The script is invoked as script *.
  • Imagine a file in that directory named |rm -rf /.

What happens?

  1. The shell expands the * and we get script file_1 file_2 '|rm -rf /' file_4
  2. The script processes file_1 and file_2.
  3. Next it opens a handle to STDIN of rm -rf /.
  4. Lots of disk activity follows.
  5. file_4 no longer exists, so we can't open it.

Of course, the possibilities are endless.

You can read more discussion of this issue at Perlmonks.

The moral of the story: be careful with the <> operator.

FWIW, I just confirmed that this is still an issue with perl 5.10.0.

daotoad
  • 26,689
  • 7
  • 59
  • 100
  • That's some serious stuff. Has anybody said what the downside of using the 3 argument open, with an explicit "read" mode, would be if used to implement the wokka <> operator? – Roboprog Apr 27 '12 at 06:05
  • OK, I read the perlmonks thread. A vocal minority insist that the insanity is a feature. "You expect that this just opens a list of files to read, but it does more, and we don't care that you wish for something safe, orthogonal and easy". There needs to be some kind of "sane diamond" pragma / module. I upvoted this answer, above my own, and hope that it gets upvoted to the top of the list eventually. – Roboprog Apr 27 '12 at 06:27
  • 2
    @Roboprog there's a CPAN module: [`ARVG::readonly`](http://search.cpan.org/~davidnico/ARGV-readonly-0.01/lib/ARGV/readonly.pm). –  Mar 18 '15 at 09:09
6

You can specify more than one -e clause. Sometimes I have a command line that starts growing as I refine a search / extract / mangulation operation. if you mistype something, you will get a "line number" telling you which -e has the error.

Of course, some might argue that if you have more than one or two -e clauses, maybe you should put whatever it is into a script, but some stuff really is just throw away, so why bother.

perl -n -e 'if (/good/)' -e '{ system "echo $_ >> good.txt"; }' \
-e 'elsif (/bad/)' -e '{ system "echo $_ >> bad.txt"; }' \
-e 'else' -e '{ system "echo $_ >> ugly.txt"; }' in.txt another.txt etc.txt

Presumably you would do something less trivial than grep / egrep into 3 files :-)

Roboprog
  • 3,054
  • 2
  • 28
  • 27
4

The -i option lets you do the changes inline:

 perl -i -pe 's/abc/def/; s/fgh/hij/' file.txt

or save a backup:

 perl -i.bak -pe 's/abc/def/; s/fgh/hij/' file.txt
brian d foy
  • 129,424
  • 31
  • 207
  • 592
jojo
  • 3,614
  • 1
  • 25
  • 21
2

I like to think of perl -n as picking out specific bits of the input and perl -p as map for all lines of the input.

As you've observed, it's possible to get the effect of -p with -n, and we can emulate the other way around:

$ echo -e "1\n2\n3" | perl -pe '$_="" if $_ % 2 == 0'
1
3

Skipping lines with next would seem more natural, but -p wraps code in

LINE:
while (<>) {
    ...     # your program goes here
} continue {
    print or die "-p destination: $!\n";
}

By design, next runs continue blocks:

If there is a continue BLOCK, it is always executed just before the conditional is about to be evaluated again. Thus it can be used to increment a loop variable, even when the loop has been continued via the next statement.

The -l switch has two handy effects:

  1. With -n and -p, automatically chomp each input record.
  2. Set $\ so every print implicitly adds a terminator.

For example, to grab the first 10 UDP ports mentioned in /etc/services you might

perl -ane 'print $F[1] if $F[1] =~ /udp/' /etc/services | head

but oops:

7/udp9/udp11/udp13/udp17/udp19/udp37/udp39/udp42/ud...

Better:

$ perl -lane 'print $F[1] if $F[1] =~ /udp/' /etc/services | head
7/udp
9/udp
11/udp
13/udp
17/udp
19/udp
37/udp
39/udp
42/udp
53/udp

Remember that -n and -p can be in the shebang line too, so to save the above oneliner as a script:

#! /usr/bin/perl -lan

BEGIN {
  @ARGV = ("/etc/services") unless @ARGV;
  open STDOUT, "|-", "head" or die "$0: head failed";
}

print $F[1] if $F[1] =~ /udp/
Greg Bacon
  • 134,834
  • 32
  • 188
  • 245
1

I often use sed or awk but I really like this perl matching pattern killer feature:

$ cat my-input.txt
git 111 HERE 2222 voila 333
any 444 HERE none start 555 HERE 6
svn 777 aaaa 8888 nothing
two 222 HERE 9999 HERE 0000

$ perl -nle 'print $a if (($a)=/HERE ([0-9]+)/)' my-input.txt
2222
6
9999
oHo
  • 51,447
  • 27
  • 165
  • 200
1

My favorite reference for Perl one liners (and the top hit on Google for that phrase) covers perl -ne: http://novosial.org/perl/one-liner/

Peter N. Steinmetz
  • 1,252
  • 1
  • 15
  • 23
Philip Durbin
  • 4,042
  • 2
  • 25
  • 36