3

Using one line of Perl code, what is the shortest way possible to print all the lines between two patterns not including the lines with the patterns?

If this is file.txt:

aaa
START
bbb
ccc
ddd
END
eee
fff

I want to print this:

bbb
ccc
ddd

I can get most of the way there using something like this:

perl -ne 'print if (/^START/../^END/);'

That includes the START and END lines, though.

I can get the job done like this:

perl -ne 'if (/^START/../^END/) { print unless (/^(START)|(END)/); };' file.txt

But that seems redundant.

What I'd really like to do is use lookbehind and lookahead assertions like this:

perl -ne 'print if (/^(?<=START)/../(?=END)/);' file.txt

But that doesn't work and I think I've got something just a little bit wrong in my regex.

These are just some of the variations I've tried that produce no output:

perl -ne 'print if (/^(?<=START)/../^.*$(?=END)/);' file.txt
perl -ne 'print if (/^(?<=START)/../^.*(?=END)/);' file.txt
perl -ne 'print if (/^(?<=START)/../(?=END)/);' file.txt
perl -ne 'print if (/^(?<=START)/../.*(?=END)/);' file.txt
perl -ne 'print if (/^(?<=START)/../^.*(?=END)/);' file.txt
perl -ne 'print if (/^(?<=START)/../$(?=END)/);' file.txt
perl -ne 'print if (/^(?<=START)/../^(?=END)/);' file.txt
perl -ne 'print if (/^(?<=START)/../(?=^END)/);' file.txt
perl -ne 'print if (/^(?<=START)/../.*(?=END)/s);' file.txt
halfer
  • 19,824
  • 17
  • 99
  • 186
Vince
  • 3,962
  • 3
  • 33
  • 58
  • 1
    See the part about sequence numbers in perlop: http://perldoc.perl.org/perlop.html#Range-Operators – ThisSuitIsBlackNot Mar 11 '16 at 07:11
  • @ThisSuitIsBlackNot Thank you. I saw "You can exclude the beginning point by waiting for the sequence number to be greater than 1." That allows me to skip printing the first pattern using `perl -ne 'print if ((/^START/../^END/) > 1);' file.txt`, but it's not a fixed number of lines so I can't exclude the last pattern in the range. – Vince Mar 11 '16 at 07:46
  • 1
    "The final sequence number in a range has the string `E0` appended to it, which doesn't affect its numeric value, but gives you something to search for if you want to exclude the endpoint." – ThisSuitIsBlackNot Mar 11 '16 at 07:52
  • @ThisSuitIsBlackNot Hah! Thank you :) I saw that at the page you linked and didn't register its meaning until now. While trying to test it, I was using `printf("%d\n")` so I didn't see the `E0`. Now I have `perl -ne '$s = /^START/../^END/; print if ($s > 1 && $s !~ /E0/);' file.txt`. It's effective and shorter than my `unless` version, but I was really hoping to make it work with the *lookarounds*. – Vince Mar 11 '16 at 08:09
  • 2
    With lookarounds, you need to read the whole text in and use something like `/(?<=^START\n)(?:(?!^END$).)*/sm`. The tempered greedy token combined with an unanchored lookbehind is actually an overkill and in case the input is large, this is a very inefficient approach. – Wiktor Stribiżew Mar 11 '16 at 08:23

4 Answers4

3

Read the whole file, match, and print.

perl -0777 -e 'print <> =~ /START.*?\n(.*?)END.*?/gs;' file.txt

May drop .*? after START|END if alone on line. Then drop \n for a blank line between segments.


Read file, split line by START|END, print every odd of @F

perl -0777 -F"START|END" -ane 'print @F[ grep { $_ & 1 } (0..$#F) ]' file.txt

Use END { } block for extra processing. Uses }{ for END { }.

perl -ne 'push @r, $_ if (/^START/../^END/); }{ print "@r[1..$#r-1]"' file.txt

Works as it stands only for a single such segment in the file.

zdim
  • 64,580
  • 5
  • 52
  • 81
  • Thanks! You gave me the shortest version in your first example. My first and second patterns are alone on the line, so I ended up with `perl -0 -e 'print <> =~ /START\n(.*?)END/gs;' file.txt`. I didn't know about either `-0` to change the record separator or the diamond operator (that was tough to find the name of). – Vince Mar 12 '16 at 09:06
  • @Vince Glad it helped :). Another useful version of it is `-00`, for paragraph mode [perlrun](http://perldoc.perl.org/perlrun.html) Your final line was my first take as well.. – zdim Mar 12 '16 at 09:39
1

It seems kind of arbitrary to place a single-line restriction on this, but here's one way to do it:

$ perl -wne 'last if /^END/; print if $p; $p = 1 if /^START/;' file.txt
Matt Jacob
  • 6,503
  • 2
  • 24
  • 27
  • That'll get the job done, but I was hoping for a way to make it work with the look-ahead / look-behind assertions, or an explanation as to why that idea doesn't work. I'll accept your answer if no one offers a solution using the *lookarounds* in a couple days. – Vince Mar 11 '16 at 07:50
1
perl -e 'print split(/.*START.|END.*/s, join("", <>))' file.txt

perl -ne 'print if /START/../END/' file.txt | perl -ne 'print unless $.==1 or eof'

perl -ne 'print if /START/../END/' file.txt | sed -e '$d' -n -e '1\!p'
cdlane
  • 40,441
  • 5
  • 32
  • 81
1

I don't see why you are so insistent on using lookarounds, but here are a couple of ways to do it.

perl -ne 'print if /^(?=START)/../^(?=END)/'

This finds the terminators without actually matching them. A zero-length match which satisfies the lookahead is matched.

Your lookbehind wasn't working because it was trying to find beginning of line ^ with START before it on the same line, which can obviously never match. Factor the ^ into the zero-width assertion and it will work:

perl -ne 'print if /(?<=^START)/../(?<=^END)/'

As suggested in comments by @ThisSuitIsBlackNot you can use the sequence number to omit the START and END tokens.

perl -ne '$s = /^START/../^END/; print if ($s>1 && $s !~ /E0/)'

The lookarounds don't contribute anything useful so I did not develop those examples fully. You can adapt this to one of the lookaround examples above if you care more about using lookarounds than about code maintainability and speed of execution.

tripleee
  • 175,061
  • 34
  • 275
  • 318
  • Thanks! At first using lookarounds seemed like the cleanest and easiest to understand solution. The file I'm working with is small and it's not executed as part of a user-facing process. I'm just saving myself some time copying and pasting from one file into another. So, performance isn't a concern in my case. If something like `/(?=START)/../^(?=END)/` could work without checking the sequence number, it would have been easy to read and maintain. Of course, that's just my opinion. – Vince Mar 12 '16 at 09:21
  • So you were thinking the lookarounds would help match without printing? Of course, in this case, you are examining an isolated line, and the whole line is always printed regardless of which part of the line matched. Slurping the entire file and substituting is different; when you substitute the part which matched, it does matter whether the regex captures the delimiter or not. – tripleee Mar 12 '16 at 09:33