Two notes before I start:
- The trailing
|
in the pattern will cause every line to match. It needs to be removed.
/3346|10989|95459|139670|2239329|3195595|3210017/
will match 9993346
, so you need to anchor the pattern.
Fixes for these problems are present in all of the following solutions.
You can pass data to a program through
- Argument list
- Environment
- An open file descriptor (e.g. stdin, but fd 3 or higher could also be used) to a pipe
- External storage (file, database, memcache daemon, etc)
You can still use the argument list. You just need to remove the argument from @ARGV
before the loop starts by using BEGIN
or avoiding -n
.
perl -ne'print if /ENGPacific Beach\s\s/' 15AM171H0N15000GAJK5 |
perl -ane'push @p, $F[1]; END { print join "|", @p; }' |
xargs -i perl -ane'
BEGIN { $p = shift(@ARGV); }
print if $F[3] =~ /^(?:$p)\z/;
' {} 15AM171H0N15000GAJK5
Perl also has a built-in argument parsing function in the form of -s
you could utilize.
perl -ne'print if /ENGPacific Beach\s\s/' 15AM171H0N15000GAJK5 |
perl -ane'push @p, $F[1]; END { print join "|", @p; }' |
xargs -i perl -sane'print if $F[3] =~ /^(?:$p)\z/' -- -p={} 15AM171H0N15000GAJK5
xargs
doesn't seem to have an option to set an environment variable, so taking that approach gets a little complicated.
perl -ne'print if /ENGPacific Beach\s\s/' 15AM171H0N15000GAJK5 |
perl -ane'push @p, $F[1]; END { print join "|", @p; }' |
xargs -i sh -c '
P="$1" perl -ane'\''print if $F[3] =~ /^(?:$ENV{P})\z/'\'' 15AM171H0N15000GAJK5
' dummy {}
It's weird to involve xargs
for a single line. If we avoid xargs
, we can turn the above (ugly) command inside out, giving something quite nice.
P="$(
perl -ne'print if /ENGPacific Beach\s\s/' 15AM171H0N15000GAJK5 |
perl -ane'push @p, $F[1]; END { print join "|", @p; }'
)" perl -ane'print if $F[3] =~ /^(?:$ENV{P})\z/' 15AM171H0N15000GAJK5
By the way, you don't need a second perl
to split only the matching lines.
P="$(
perl -ne'
push @p, (split)[1] if /ENGPacific Beach\s\s/;
END { print join "|", @p; }
' 15AM171H0N15000GAJK5
)" perl -ane'print if $F[3] =~ /^(?:$ENV{P})\z/' 15AM171H0N15000GAJK5
That said, I think using $ENV{P}
repeatedly should be avoided to speed things up.
P=... perl -ane'print if $F[3] =~ /^(?:$ENV{P})\z/o' 15AM171H0N15000GAJK5
From there, I see two possible speed improvements. (Test to be sure.)
Avoiding splitting entirely in the last perl
.
P=... perl -ne'
BEGIN { $re = qr/^(?:\S+\s+){3}(?:$ENV{P})\s/o; }
print if /$re/o;
' 15AM171H0N15000GAJK5
Avoiding regular expressions entirely in the last perl
.
P=... perl -ane'
BEGIN { %h = map { $_ => 1 } split /\|/, $ENV{P} }
print if $h{$F[3]};
' 15AM171H0N15000GAJK5