2

I'm quite new to Perl and I have some problems in skipping lines using a foreach loop. I want to copy some lines of a text file to a new one.

When the first words of a line are FIRST ITERATION, skip two more lines and print everything following until the end of the file or an empty line is encountered.

I've tried to find out a similar post but nobody talks about working with text files.

This is the form I thought of

use 5.010;
use strict;
use warnings;

open( INPUT, "xxx.txt" ) or die("Could not open log file.");
open( OUT, ">>yyy.txt" );

foreach my $line (<INPUT>) {

    if ( $line =~ m/^FIRST ITERATION/ ) {

        # print OUT
    }
}

close(OUT);
close(INFO);

I tried using next and $line++ but my program prints only the line that begins with FIRST ITERATION.

I may try to use a for loop but I don't know how many lines my file may have, nor do I know how many lines there are between "First Iteration" and the next empty line.

Borodin
  • 126,100
  • 9
  • 70
  • 144
René
  • 21
  • 3
  • 1
    `for` and `foreach` are synonyms in Perl. Both support C-style and map-style syntax. What you need is a `while` loop. Also note that comments in Perl are made using `#`. – simbabque Mar 21 '16 at 10:24

6 Answers6

5

The simplest way is to process the file a line at a time and keep a state flag which is set to 1 if the current line is begins with FIRST ITERATION and 0 if it is blank, otherwise it is incremented if it is already positive so that it provides a count of the line number within the current block

This solution expects the path to the input file as a parameter on the command line and prints its output to STDOUT, so you will need to redirect the output to the file on the command line as necessary

Note that the regex pattern /\S/ checks whether there is a non-blank character anywhere in the current line, so not /\S/ is true if the line is empty or all blank characters

use strict;
use warnings;

my $lines = 0;

while ( <> ) {

    if ( /^FIRST ITERATION/ ) {
        $lines = 1;
    }
    elsif ( not /\S/ ) {
        $lines = 0;
    }
    elsif ( $lines > 0 ) {
        ++$lines;
    }

    print if $lines > 3;
}

This can be simplified substantially by using Perl's built-in range operator, which keeps its own internal state and returns the number of times it has been evaluated. So the above may be written

use strict;
use warnings;

while ( <> ) {
    my $s = /^FIRST ITERATION/ ... not /\S/;
    print if $s and $s > 3;
}

And the last can be rewritten as a one-line command line program like this

$ perl -ne '$s = /^FIRST ITERATION/ ... not /\S/; print if $s and $s > 3' myfile.txt
Borodin
  • 126,100
  • 9
  • 70
  • 144
  • ++ `not /\S/` reads oddly ("match on not non-blank") but looks nicer/clearer than `/^$/` and `/\A\z/` and is slightly and usefully different . I am trying to undersatnd `/^\p{Blank}/` here: http://stackoverflow.com/q/36162764/2019415 if you can help :-\ – G. Cito Mar 22 '16 at 18:50
  • @G.Cito: I disagree. A non-blank line is `/\S/` so a blank line is `not /\S/`. You're clearly from the neighbourhood that don't like `unless`! Just `/\A\z/` won't do—the equivalent to `not /\S/` is `/\A\s*\z/`, and I think *that* is far uglier, and clearly prone to error as you made one yourself. Of course you may choose whichever form you prefer, but please do remember any others that may need to read and work with your code – Borodin Mar 22 '16 at 18:58
  • :-D they *all* read oddly ! But `not /\S/` is the best of the bunch so ++. `/^$/` has become so familiar to perl programmers it's almost a real (unreadable) word. BUT no one expects the "Spanish Inquisition": `my $string = "\n", "Inquisition time!\n\n" ; say "whee" if $string =~ /^$/ ;` – G. Cito Mar 22 '16 at 19:06
2

Use additional counter, that will say on which condition print line. Something like this:

$skipCounter = 3;

And in foreach:

if ($skipCounter == 2) {
    // print OUT
}
if ( $line =~ m/^FIRST ITERATION/) {
    $skipCounter = 0;
}

$skipCounter++;
DevilaN
  • 1,317
  • 10
  • 21
2

Advice: Use STDIN and STDOUT instead of files, this will allowes you to change them without modifying script

Code:

#!/usr/bin/perl
use 5.010;
use strict;
use warnings;


open(INPUT, "xxx.txt" ) or die "Could not open log file: $!.";
open(OUT, ">yyy.txt") or die "Could not open output file: $!";
while( my $line = <INPUT> )
{
    if ( $line =~ m/^FIRST ITERATION/) {
        <INPUT>; # skip line
        <INPUT>; # skip line
        while( $line = <INPUT>) # print till empty line
        {
            last if $line eq "\n";
            print OUT $line;
        }
    };
};
close (OUT);
close (INPUT);
  • 2
    If you change to three-argument `open` and lexical file handles, this is a really good answer. But you don't need `;` after blocks, and the line ending might not be `\n`. It could be a Windows or Mac file. – simbabque Mar 21 '16 at 10:51
  • 2
    @simbabque In text mode, [Perl translates `\n` to the appropriate value for your platform](http://perldoc.perl.org/perlport.html#Newlines) (and vice versa). – ThisSuitIsBlackNot Mar 21 '16 at 14:30
  • 2
    @simbabque (Of course, that only works if the file already has the native line endings for whatever platform you're running on; the above code won't work if you're reading a CRLF file on *nix, so it should really check for vertical whitespace instead of `\n` to be more robust.) – ThisSuitIsBlackNot Mar 21 '16 at 14:58
1

You're on the right track. What you need to use is the flip-flop operator (which is basically the range operator) ... It will toggle for you between two matches, so you get everything in between. After that, it's a matter of keeping track of the lines you want to skip.

So basically we are checking for FIRST ITERATION and for an empty line, and grab everything in between those. $skip is used to remember how many lines were skipped. It starts at 0 and gets incremented for the first two lines after we start being in the flip-flop if block. In the else case, where we are after the flip-flop, it gets reset to 0 so we can start over with the next block.

Since you know how to open and write files, I'll skip that.

use strict;
use warnings;

my $skip = 0;
while (<DATA>) {
    if (/^FIRST ITERATION$/ .. /^$/) {
        next if $skip++ <= 2;
        print $_;
    } else {
        $skip = 0;
    }
}
__DATA__
FIRST ITERATION
skip1
skip2
foo
bar
baz

don't print this

The output of this is:

foo
bar
baz

To stick with your own code, here's a very verbose solution that uses a foreach and no flip-flop. It does the same thing, just with a lot more words.

my $skip = 0;   # skip lines
my $match = 0;  # keep track of if we're in between the borders
foreach my $line (<DATA>) {
    if ( $line =~ m/^FIRST ITERATION/ ) {
        $match = 1; # we are inside the match
        next;
    }
    if ($line =~ m/^$/) {
        $match = 0; # we are done matching
        next;
    }
    if ($match) {
        $skip++;     # count skip-lines
        if ($skip <= 2) {
            next;    # ... and skip the first two
        }
        print $line; # this is the content we want  
    }
}
simbabque
  • 53,749
  • 8
  • 73
  • 136
1

Using paragraph mode (which returns blocks separated by blank lines rather than lines):

local $/ = "";  # Paragraph mode.

while (<>) {
    s/\n\K\n+//;  # Get rid of trailing empty lines.
    print /^FIRST ITERATION\n[^\n]*\n[^\n]*\n(.*)/ms;
}
ikegami
  • 367,544
  • 15
  • 269
  • 518
1

Using the flip-flop operator:

while (<>) {
    if (my $line_num = /^FIRST ITERATION$/ .. /^$/) {
        print if $line_num > 3 && $line_num !~ /E0/;
    }
}

$line_num !~ /E0/ is true when the flip-flop is flopping (i.e. for the first empty line after FIRST ITERATION). This is checked to avoid printing the blank line.

ikegami
  • 367,544
  • 15
  • 269
  • 518
  • The [flip-flop/range operators](http://perldoc.perl.org/perlop.html#Range-Operators) documentation covers the (somewhat obscure) `E0`: "*The final sequence number in a range has the string "`E0`" appended to it, which doesn't affect its numeric value, but gives you something to search for if you want to exclude the endpoint*." @ikegami - I almost rolled that into your question as a slight improvement :-) – G. Cito Mar 22 '16 at 15:57