How to extract only the first instance of a number of lines between two strings in bash?

Question

My file is:

abc
123
xyz
abc
675
xyz

And I want to extract:

abc
123
xyz

(123 could be anything, the point is I want the first occurrence)

I tried using this:

sed -n '/abc/,/xyz/p' filename

but this is giving me all the instances. How could I get just the first one?

Don't use a range expression (`/start/,/end/`) as they just make trivial tasks very slightly briefer than using a flag and then need a complete rewrite or duplicate conditions when your requirements change in the slightest. Just use a flag (which means you can't use sed - oh well) indicate when you're in the block started by your first condition being true. — Ed Morton, Jul 24 '20 at 12:43
What is the expected output if there are no lines matching `xyz` following the line matching `abc` ? — M. Nejat Aydin, Jul 24 '20 at 13:11
What's the expected output if `abc` occurs twice before the first `xyz`? — Ed Morton, Jul 24 '20 at 13:18
Interesting questions. I should have mentioned it myself. 1. There is no possibility of abc occurring twice before xyz. A maven check style plugin ensure that 2. If no lines match, no out is "expected", I'll make do with whatever I get, as long as it's definitive. — Vyom Maitreya, Jul 27 '20 at 04:45

RavinderSingh13 · Accepted Answer · 2020-07-24T12:43:05.537

Could you please try following, written and tested with shown samples.

awk '/abc/{found=1} found; /xyz/ && found{exit}'  Input_file

OR as per Ed sir's comment for better efficiency try following.

awk '/abc/{found=1} found{print; if (/xyz/) exit}'  Input_file

Explanation: Adding detailed explanation for above.

awk '               ##Starting awk program from here.
/abc/{              ##checking condition if a line has abc in it then do following.
  found=1           ##Setting found here.
}
found;              ##Checking condition if found is SET then print that line.
/xyz/ && found{     ##Checking if xyz found in line and found is SET then do following.
  exit              ##exit program from here.
}
'  Input_file       ##Mentioning Input_file name here.

score 2 · Answer 2 · answered Jul 24 '20 at 11:51

2

If you don't mind Perl:

perl -ne 'm?abc?..m?xyz? and print' file

will print only the first block that matches. The delimiter for the matches must be the ? character.

answered Jul 24 '20 at 11:51

JRFerguson

7,426
2
32
36

2

Does that exit after the first block is found or will it keep going reading all the lines after it? I honestly dont know as I'm not familiar with perl syntax and and how the various options affect what the script means/does. – Ed Morton Jul 24 '20 at 12:48
1

@EdMorton Yes, the 'm?PATTERN?..m?PATTERN?' stops matching after the first block. This once-only matching is described in more detail in [perlop](https://perldoc.perl.org/5.32.0/perlop.html#Range-Operators). – JRFerguson Jul 24 '20 at 13:43
But it still *reads* the rest of the file, doesn't it? Just doesn't print? – Paul Hodges Jul 24 '20 at 15:34
@PaulHodges Yes, it still _reads_ the file. The `range` (...) operator becomes true on a match with the first operand and remains true until the second operand is true. Using `?` as the match delimiter limits the action to a once-only match unless a `reset` operator is used (as for processing a new file when an EOF on the first file is detected. Saying `...and print` is short for 'if (...) {print}`. – JRFerguson Jul 24 '20 at 16:26
I'd just rather see it abort reading past the point where we know we're done. Nit-picky. :) – Paul Hodges Jul 24 '20 at 16:31
@PaulHodges There's more to using the `range` operator than shown in my example. The one-liner with one input file could be written to stop reading upon finding the end of block by testing for the terminal condition twice as others have noted. YMMV and TMTOWTDI – JRFerguson Jul 24 '20 at 17:05

score 1 · Answer 3 · answered Jul 24 '20 at 11:48

1

Using sed you can do:

sed -n '/abc/,/xyz/p; /xyz/q' filename

q will quit after the "xyz" pattern is reached.

answered Jul 24 '20 at 11:48

Maroun

94,125
30
188
241

5

In a file with "xyz" appears before "abc", this might result into something you don't want. – kvantour Jul 24 '20 at 12:10
1

@kvantour But that's not the example provided by the OP. – Todd A. Jacobs Jul 24 '20 at 13:15
1

@Todd Yes, but we like foolproof answers better in shell related tags – oguz ismail Jul 24 '20 at 13:18
3

There are 3 cases the OP didn't address - 1) `xyz` before the first `abc`, 2) no `xyz` after the `abc`, and 3) 2 `abc`s before the first `xyz`. We can debate how we think the OP would want 2 and 3 handled but there's no question how `1` should be handled (ignore it) and it's trivial to handle so a reasonable answer would simply do so. – Ed Morton Jul 24 '20 at 13:35
You could use `sed '/abc/,$! d ; /xyz/ q' input` to avoid the duplicate pattern and deal with `xyz` before `abc`. – luciole75w Jul 24 '20 at 14:03
Thanks for your comments. I believe OP can take the answer and improve it however they wish. – Maroun Jul 24 '20 at 14:05

Todd A. Jacobs · Answer 4 · 2020-07-24T12:22:20.080

1

Match the Terminal Condition Twice

Regardless of the language, the most common technique for line-oriented processing is to print lines within a given range and then use a second command to exit the loop when your terminal condition is reached. This will be true for common patterns in sed, awk, ruby, and perl, although there are certainly other techniques that can be performed using multi-line matches (not supported in sed without using the hold space). For example, you might use a non-greedy, multi-line regular expression pattern such as /^abc\n.*?\nxyz$/m.

To illustrate the line-oriented approach you want a little more verbosely, consider this Ruby one-liner where $_ holds the current input line. From the shell:

$ ruby -ne 'puts $_ if /^abc$/ .. /^xyz$/; exit if /^xyz/' filename 
abc
123
xyz

The equivalent in sed is:

$ sed -n '/^abc$/,/^xyz$/p; /^xyz$/q' filename
abc
123
xyz

All you were missing was a quit or exit command attached to the second match against the first instance of xyz.

edited Jul 24 '20 at 12:22

answered Jul 24 '20 at 12:08

Todd A. Jacobs

81,402
15
141
199

4

Duplicate code such as specifying the same condition twice is usually (always?) a bad idea and idk what the ruby code does but the sed code would print nothing and exit if the first `xyz` appeared before the first `abc` in the input. – Ed Morton Jul 24 '20 at 12:42
@EdMorton No. Sequenced instructions are the basic building blocks of any program. While some language features (e.g. the sed hold space, Perl/Ruby flip-flop operators, or multiline matches in supported regular expression enginers) may arguably allow you to avoid the need for a separate pattern match as opposed to backtracking, you aren't really avoiding a second instruction in any approach I can currently conceive. If you'd like to prove me wrong, please contribute your own answer for consideration. – Todd A. Jacobs Jul 24 '20 at 13:10
1

idk what part of my comment you're saying `No` too but wrt what I'd do: since he was close I just commented on [@RavinderSingh13's answer](https://stackoverflow.com/a/63073005/1745001) so he included my suggestion in his but it's just `awk '/abc/{f=1} f{print; if (/xyz/) exit}'`. – Ed Morton Jul 24 '20 at 13:12
@EdMorton Count your conditions. You're just trading a pattern match for a variable test. You now have three commands/statements rather than two. – Todd A. Jacobs Jul 24 '20 at 13:18
I don't care about how many conditions are present in the code as that's completely irrelevant. In addition to providing the expected output for the posted sample input and reasonably predictable other input I care about not having duplicate code (so I don't have to change 2 places in my code when my start or end regexp changes) and where in the code the conditions are tested (so I'm only performing a test when necessary) and exiting as soon as the first/only possible matching block is printed. – Ed Morton Jul 24 '20 at 13:21
@EdMorton If you think your example is somehow more DRY, you're entitled to that viewpoint, although the value of DRY vs. DAMP in a one-liner is dubious at best. My examples work with the OP's example input and expected output, and if you don't like the way I explained the underlying approach that's up to you. I prefer to optimize for clarity and specificity; if you'd like to optimize for something else, that's fine too. [TIMTOWTDI](https://en.wikipedia.org/wiki/There%27s_more_than_one_way_to_do_it). – Todd A. Jacobs Jul 24 '20 at 13:32
Producing the expected output for a given sample input set is the starting point to identifying a solution, not the end point. Duplicate code is bad. Needlessly inefficient code is bad. Code that will clearly, obviously, and unnecessarily fail given a minor, predictable change to the input is bad. That's not just my viewpoint, it's programming fundamentals. I guess we'll just have to agree to disagree on how important they are. – Ed Morton Jul 24 '20 at 13:44

Paul Hodges · Answer 5 · 2020-07-24T14:51:11.357

This has already been well and sufficiently answered and hashed out by better minds than me, but

since you explicitly used sed, and
for some variety of approach that handles the requested conditions...

sed -n '/abc/,/xyz/{ p; /xyz/q; }' filename

This only looks at the range in question, so won't print or quit on xyz with no opening abc ahead of it
it prints all the records in the range
it exits on the first xyz it sees AFTER an abc, so will dependably exit
if the end doesn't have an xyz it will just print to EOF.

You might refine the pattern if you want to make sure of exact matching, such as

sed -n '/^abc$/,/^xyz$/{ p; /^xyz$/q; }' filename

This prevents near-matches from confusing the logic, but is (intentionally) unforgiving of botched sentinel strings.

For non-GNU sed, replace the semicolons with newlines. – Paul Hodges Jul 24 '20 at 15:35 — Paul Hodges, Jul 24 '20 at 15:35

score 0 · Answer 6 · answered Jul 24 '20 at 19:38

This might work for you (GNU sed):

 sed '/abc/!d;:a;n;/xyz/!ba;q' file

If it is not a line containing abc delete it.

Otherwise, print it and fetch the next.

If that line is not xyz, repeat.

Otherwise, quit.

N.B. The option -n is not set and so the last line will be printed before termination.

This will print until the end of the file or the string xyz is encountered.

If xyz must be present, use:

sed -n '/abc/!d;:a;N;/xyz/!ba;p;q' file

How to extract only the first instance of a number of lines between two strings in bash?

6 Answers6

Match the Terminal Condition Twice

Linked