11

I have an ASCII log file with some content I would like to extract. I've never taken time to learn Perl properly, but I figure this is a good tool for this task.

The file is structured like this:

... 
... some garbage 
... 
... garbage START
what i want is 
on different
lines 
END 
... 
... more garbage ...
next one START 
more stuff I want, again
spread 
through 
multiple lines 
END 
...
more garbage

So, I'm looking for a way to extract the lines between each START and END delimiter strings. How can I do this?

So far, I've only found some examples on how to print a line with the START string, or other documentation items that are somewhat related with what I'm looking for.

jbatista
  • 2,747
  • 8
  • 30
  • 48
  • Use the global match /g rather than letting it stop at the line terminator. – Lazarus Jul 31 '09 at 14:21
  • you meant /s ? AFAIK /g is **multiple** match. – Steve Schnepp Jul 31 '09 at 14:45
  • 1
    This is a duplicate question. See.... http://stackoverflow.com/questions/296366/how-can-i-extract-lines-of-text-from-a-file/296672#296672 – draegtun Jul 31 '09 at 15:38
  • 1
    See also [How to print lines between two patterns, inclusive or exclusive](https://stackoverflow.com/questions/38972736/how-to-print-lines-between-two-patterns-inclusive-or-exclusive-in-sed-awk-or) – Sundeep Oct 30 '20 at 08:22

6 Answers6

23

You want the flip-flop operator (also known as the range operator) ..

#!/usr/bin/env perl
use strict;
use warnings;

while (<>) {
  if (/START/../END/) {
    next if /START/ || /END/;
    print;
  }
}

Replace the call to print with whatever you actually want to do (e.g., push the line into an array, edit it, format it, whatever). I'm next-ing past the lines that actually have START or END, but you may not want that behavior. See this article for a discussion of this operator and other useful Perl special variables.

Telemachus
  • 19,459
  • 7
  • 57
  • 79
  • Works for me!! Since I want to exclude the lines with the delimiters, I can pipe the output through grep -v for example. BTW, in the first line after START, how could I remove the first character in a line? – jbatista Jul 31 '09 at 14:33
  • 1
    The one-liner version: perl -ne 'print if /START/../END/' – William Pursell Jul 31 '09 at 17:32
  • 2
    William, that will print the lines with START and END. If you don't want them, here's the oneliner of Telemachus: perl -ne 'if (/START/../END/) {print unless /START/ or /END/}' – glenn jackman Jul 31 '09 at 18:09
  • @Telemachus - How would I get this to work with a variable rather than reading from a file. Say I have a `$variable = "dont want this part START i want this part instead END";` ? Because I am having trouble getting the same effect when it is a variable as opposed to a file, your help is much appreciated, thanks – yonetpkbji May 03 '13 at 09:26
  • 1
    @perl-user If the string is longer (and especially if it's separated by newlines or something very regular), then you can use `open` and treat a string variable like a filehandle. But if the string really looks like what you have here, it doesn't seem worth it. You could just use substitution to remove the parts you don't want: `s/^.*START //` and then `s/ END$//`, for example. If it's more complex, I would open a new question. – Telemachus May 04 '13 at 11:18
5

From perlfaq6's answer to How can I pull out lines between two patterns that are themselves on different lines?


You can use Perl's somewhat exotic .. operator (documented in perlop):

perl -ne 'print if /START/ .. /END/' file1 file2 ...

If you wanted text and not lines, you would use

perl -0777 -ne 'print "$1\n" while /START(.*?)END/gs' file1 file2 ...

But if you want nested occurrences of START through END, you'll run up against the problem described in the question in this section on matching balanced text.

Here's another example of using ..:

while (<>) {
    $in_header =   1  .. /^$/;
    $in_body   = /^$/ .. eof;
# now choose between them
} continue {
    $. = 0 if eof;  # fix $.
}
brian d foy
  • 129,424
  • 31
  • 207
  • 592
1

How can I grab multiple lines after a matching line in Perl?

How's that one? In that one, the END string is $^, you can change it to your END string.

I am also a novice, but the solutions there provide quite a few methods... let me know more specifically what it is you want that differs from the above link.

Community
  • 1
  • 1
Dirk
  • 6,774
  • 14
  • 51
  • 73
1
while (<>) {
    chomp;      # strip record separator
    if(/END/) { $f=0;}
    if (/START/) {
        s/.*START//g;
        $f=1;
    }
    print $_ ."\n" if $f;
}

try to write some code next time round

ghostdog74
  • 327,991
  • 56
  • 259
  • 343
  • I understand, and I would have written some code if I had already started to learn Perl. I've managed to go by with awk and sed so far. But anyway thanks for your advice. – jbatista Jul 31 '09 at 14:35
1

After Telemachus' reply, things started pouring out. This works as the solution I'm looking at after all.

  1. I'm trying to extract lines delimited by two strings (one, with a line ending with "CINFILE="; other, with a line containing a single "#") in separate lines, excluding the delimiter lines. This I can do with Telemachus' solution.
  2. The first line has a space I want to remove. I'm also including it.
  3. I'm also trying to extract each line-set into separate files.

This works for me, although the code can be classified as ugly; this is because I'm currently a virtually newcomer to Perl. Anyway here goes:

#!/usr/bin/env perl
use strict;
use warnings;

my $start='CINFILE=$';
my $stop='^#$';
my $filename;
my $output;
my $counter=1;
my $found=0;

while (<>) {
  if (/$start/../$stop/) {
    $filename=sprintf("boletim_%06d.log",$counter);
    open($output,'>>'.$filename) or die $!;
    next if /$start/ || /$stop/;
    if($found == 0) { print $output (split(/ /))[1]; }
    else { print $output $_; }
    $found=1;
  } else { if($found == 1) { close($output); $counter++; $found=0; } }
}

I hope it benefits others as well. Cheers.

jbatista
  • 2,747
  • 8
  • 30
  • 48
1

Not too bad for coming from a "virtual newcommer". One thing you could do, is to put the "$found=1" inside of the "if($found == 0)" block so that you don't do that assignment every time between $start and $stop.

Another thing that is a bit ugly, in my opinion, is that you open the same filehandler each time you enter the $start/$stop-block.

This shows a way around that:

#!/usr/bin/perl

use strict;
use warnings;

my $start='CINFILE=$';
my $stop='^#$';
my $filename;
my $output;
my $counter=1;
my $found=0;

while (<>) {

    # Find block of lines to extract                                                           
    if( /$start/../$stop/ ) {

        # Start of block                                                                       
        if( /$start/ ) {
            $filename=sprintf("boletim_%06d.log",$counter);
            open($output,'>>'.$filename) or die $!;
        }
        # End of block                                                                         
        elsif ( /$end/ ) {
            close($output);
            $counter++;
            $found = 0;
        }
        # Middle of block                                                                      
        else{
            if($found == 0) {
                print $output (split(/ /))[1];
                $found=1;
            }
            else {
                print $output $_;
            }
        }

    }
    # Find block of lines to extract                                                           

}
dala
  • 1,975
  • 3
  • 14
  • 15
  • Thanks. I now feel I should waste^H^H^H^H^Huse some time to properly learn Perl. My background is on C, some C++ and some Fortran, so it does seem familiar. – jbatista Aug 07 '09 at 10:20
  • BTW, I admit I was lax on the opening many files, my main concern at the time was to get something that did work, even if not too well. – jbatista Aug 07 '09 at 10:23