How can I grab multiple lines after a matching line in Perl?

Question

I'm parsing a large file in Perl line-by-line (terminated by \n), but when I reach a certain keyword, say "TARGET", I need to grab all the lines between TARGET and the next completely empty line.

So, given a segment of a file:

Line 1
Line 2
Line 3
Line 4 Target
Line 5 Grab this line
Line 6 Grab this line
\n

It should become:
Line 4 Target
Line 5 Grab this line
Line 6 Grab this line

The reason I'm having trouble is I'm already going through the file line-by-line; how do I change what I delimit by midway through the parsing process?

score 23 · Accepted Answer · answered Jun 24 '09 at 20:11

23

You want something like this:

my @grabbed;
while (<FILE>) {
    if (/TARGET/) {
        push @grabbed, $_;
        while (<FILE>) {
            last if /^$/;
            push @grabbed, $_;
        }
    }
}

answered Jun 24 '09 at 20:11

dave4420

46,404
6
118
152

Ah, thanks, I wasn't sure if while inside another while was ok in perl :) – Dirk Jun 24 '09 at 20:15
@Michael It is just another readline call, so, yes, it is OK. perldoc -f readline – Sinan Ünür Jun 24 '09 at 20:22
2

If the handle isn't pointing to an actual file, but rather something like STDIN, you can have the inner while get an eof and terminate and then the outer while continue to read until *it* gets an eof. Try it with: perl -wle'print "read a"; while (<>) { print "read b"; while (<>) { print "read b" } print "read a" }' – ysth Jun 25 '09 at 00:37

Greg Bacon · Answer 2 · 2009-06-24T21:28:25.657

14

The range operator is ideal for this sort of task:

$ cat try
#! /usr/bin/perl

while (<DATA>) {
  print if /\btarget\b/i .. /^\s*$/
}

__DATA__
Line 1
Line 2
Line 3
Line 4 Target
Line 5 Grab this line
Line 6 Grab this line

Nope
Line 7 Target
Linu 8 Yep

Nope again

$ ./try
Line 4 Target
Line 5 Grab this line
Line 6 Grab this line

Line 7 Target
Linu 8 Yep

edited Jun 24 '09 at 21:28

answered Jun 24 '09 at 20:39

Greg Bacon

134,834
32
188
245

score 10 · Answer 3 · answered Jun 24 '09 at 20:17

10

The short answer: line delimiter in perl is $/, so when you hit TARGET, you can set $/ to "\n\n", read the next "line", then set it back to "\n"... et voilà!

Now for the longer one: if you use the English module (which gives sensible names to all of Perl's magic variable, then $/ is called $RS or $INPUT_RECORD_SEPARATOR. If you use IO::Handle, then IO::Handle->input_record_separator( "\n\n") will work.

And if you're doing this as part of a bigger piece of code, don't forget to either localize (using local $/; in the appropriate scope) or to set back $/ to its original value of "\n".

answered Jun 24 '09 at 20:17

mirod

15,923
3
45
65

I like how you explained the way to do this without giving code. It is a little longer, but in the end the reader is better off to do something similar in the future. – Ape-inago Jun 24 '09 at 20:37
1

If you `use English;` (which I don't, but whatever floats your boat) be sure to `use English '-no_match_vars';`, otherwise you'll get a performance hit with regular expressions. – Chris Lutz Jun 24 '09 at 21:34
1

@Chris Lutz you are right, I just assumed that if you use English, then you would read the docs. – mirod Jun 25 '09 at 04:58

score 4 · Answer 4 · answered Jul 31 '09 at 18:02

From perlfaq6's answer to How can I pull out lines between two patterns that are themselves on different lines?

You can use Perl's somewhat exotic .. operator (documented in perlop):

perl -ne 'print if /START/ .. /END/' file1 file2 ...

If you wanted text and not lines, you would use

perl -0777 -ne 'print "$1\n" while /START(.*?)END/gs' file1 file2 ...

But if you want nested occurrences of START through END, you'll run up against the problem described in the question in this section on matching balanced text.

Here's another example of using ..:

while (<>) {
    $in_header =   1  .. /^$/;
    $in_body   = /^$/ .. eof;
# now choose between them
} continue {
    $. = 0 if eof;  # fix $.
}

score 2 · Answer 5 · answered Jun 24 '09 at 20:11

2

while(<FILE>)
{
    if (/target/i)
    {
        $buffer .= $_;
        while(<FILE>)
        {
            $buffer .= $_;
            last if /^\n$/;
        }
    }
}

answered Jun 24 '09 at 20:11

user105033

18,800
19
58
69

telesphore4 · Answer 6 · 2009-06-24T20:51:02.490

1

use strict;
use warnings;

my $inside = 0;
my $data = '';
while (<DATA>) {
    $inside = 1 if /Target/;
    last if /^$/ and $inside;
    $data .= $_ if $inside;
}

print '[' . $data . ']';

__DATA__
Line 1
Line 2
Line 3
Line 4 Target
Line 5 Grab this line
Line 6 Grab this line

Next Line

Edit to fix the exit condition as per the note below.

edited Jun 24 '09 at 20:51

answered Jun 24 '09 at 20:23

telesphore4

877
1
7
19

I'd be against flags, but this is one of the clearest i've seen so far! – Ape-inago Jun 24 '09 at 20:36
d0h! I should change that to "last if /^$/ and $inside;" to handle the case where there is a blank line before the target. – telesphore4 Jun 24 '09 at 20:48

score 0 · Answer 7 · answered Jun 24 '09 at 20:17

0

If you don't mind ugly auto-generated code, and assuming you just want lines between TARGET and the next empty line, and want all the other lines to be dropped, you can use the output of this command:

s2p -ne '/TARGET/,/^$/p'

(Yes, this is a hint that this problem is usually much more easily solved in sed. :-P)

answered Jun 24 '09 at 20:17

C. K. Young

219,335
46
382
435

2

See gbacon's answer. This could be written as "perl -ne 'print if /TARGET/ .. /^$/'" which is more or less exactly what you have. – user55400 Jun 25 '09 at 07:14
Thanks for the heads-up! I seldom come back to check for other people's answers, so it's good that there is a clearly more superior answer given. – C. K. Young Jun 25 '09 at 11:23

score 0 · Answer 8 · answered Jun 24 '09 at 20:21

0

If you only want one loop (modifying Dave Hinton's code):

my @grabbed;
my $grabbing = 0;
while (<FILE>) {
    if (/TARGET/ ) {
       $grabbing = 1;
    } elsif( /^$/ ) {
       $grabbing = 0;
    }
    if ($grabbing) {
        push @grabbed, @_;
    }
}

answered Jun 24 '09 at 20:21

Graeme Perrow

56,086
21
82
121

take a look at some of the other examples here... $flags should be avoided as this is 'perl' code, and as such you should be using perl-isms. – Ape-inago Jun 24 '09 at 20:35
@Ape-inago Can you explain? (I just noticed that i use 'flags' in code elsewhere) – Dirk Jun 24 '09 at 20:41
8

Use flags if that's what makes sense to you. 'Any level of language proficiency is acceptable in Perl culture. We won't send the language police after you. A Perl script is "correct" if it gets the job done before your boss fires you.' - Larry Wall – ysth Jun 25 '09 at 00:41

score 0 · Answer 9 · answered Mar 28 '16 at 09:02

0

while (<IN>) {
print OUT if (/Target/../^$/) ; 
}

answered Mar 28 '16 at 09:02

Sumathi Gokul

101
2
8

How can I grab multiple lines after a matching line in Perl?

9 Answers9

Linked