how do I use awk to print starting at pattern, end at another pattern then exit?

Question

I have a text file that has the following format:

50000

55000

60000

65000

150000

160000

I want to print everything starting 50000 and ending at 60000. What I tried was:

awk "/50000/,/60000/ {print}"

But this also prints the 150000 and 160000. How should I modify this?

if you provide an 'end range' that doesn't exist in the file (eg, `70000`) what would you expect as the output? everything from `50000` to the end of the file, or nothing? similar question applies if the 'start ranges doesn't exist ... everything from beginning of file to 'end range' or nothing? and of course, if neither 'start/end range' exist in the file then what ... display entire file or nothing? other questions ... is the input file contents guaranteed to already be sorted (numerically)? is each number unique within the file or can a number occur more than once in the file? — markp-fuso, Jan 24 '22 at 23:34
Does this answer your question? [How to use awk to extract a line with exact match](https://stackoverflow.com/questions/17960758/how-to-use-awk-to-extract-a-line-with-exact-match) — Wiktor Stribiżew, Jan 25 '22 at 12:38
Depends what you mean by "pattern" (see [how-do-i-find-the-text-that-matches-a-pattern](https://stackoverflow.com/questions/65621325/how-do-i-find-the-text-that-matches-a-pattern)) and what else you might want to do (see [is-a-start-end-range-expression-ever-useful-in-awk](https://stackoverflow.com/questions/23934486/is-a-start-end-range-expression-ever-useful-in-awk)). — Ed Morton, Jan 25 '22 at 13:34

score 2 · Answer 1 · answered Jan 25 '22 at 13:37

Robustly and efficiently you'd do:

awk '$1==50000{f=1} f{print; if ($1==60000) exit}' file

The exit is so awk doesn't continue wasting time reading the input long after the last line you want to process.

The above assumes that if 60000 didn't exist in the input but 50000 did then you'd want to print the lines from 50000 to the end of the file. If that's not the case then:

awk '$1==50000{f=1} f{ buf=buf $1 ORS; if ($1==60000) {printf "%s", buf; exit} }' file

The fourth bird · Accepted Answer · 2022-01-25T20:06:22.243

1

Currently with the ranges you get a partial match for 50000 in 150000 and 60000 in 160000 and you are printing:

and

150000

160000

If you want to match the whole line without partial matches, you can use anchors for the start and the end pattern.

awk '/^50000$/,/^60000$/' file

edited Jan 25 '22 at 20:06

answered Jan 24 '22 at 23:06

The fourth bird

154,723
16
55
70

1

You can remove the `{print}` and let the default print operation handle the output. I suspect you know that and have included it for readability. – David C. Rankin Jan 25 '22 at 03:15
1

@EdMorton Sorry for the late response, I was not behind my machine. Thanks for the feedback, always a pleasure. – The fourth bird Jan 25 '22 at 20:07

score 1 · Answer 3 · answered Jan 25 '22 at 01:04

1

Best practice with awk is to not use a sed style regex range.

Instead, set a flag to start printing and another flag to stop (and perhaps exit.)

Example:

seq 100 | awk '
/^22$/{f=1}
/^29$/{exit}
f'

Prints:

answered Jan 25 '22 at 01:04

dawg

98,345
23
131
206

score 1 · Answer 4 · answered Jan 25 '22 at 01:21

1

if you're not matching a regular expression you can set the criteria to equivalence instead

$ awk '$0==50000,$0==60000' file

will give you the desired range.

answered Jan 25 '22 at 01:21

karakfa

66,216
7
41
56

Like the succinct range expression and default print. (and since the other 2 answers are correct as well, each worthy of a nod) – David C. Rankin Jan 25 '22 at 03:10

score 1 · Answer 5 · answered Jan 25 '22 at 03:50

1

Also, numeric comparison works:

awk '50000 <= $1 && $1 <= 60000' file

The print is implicit here.

answered Jan 25 '22 at 03:50

glenn jackman

238,783
38
220
352

score 0 · Answer 6 · answered Jan 25 '22 at 05:20

you can also go for a string-based approach :

 gawk/nawk  '/^(5[0-9]{4}|6[0]{4})$/'

 mawk/mawk2 '/^(5[0-9][0-9][0-9][0-9]|60000)$/'

I'd recommend against [[:digit:]] in place of [0-9] since non-C/POSIX locales may result in matching multi-byte "digits", such as those in Unicode.

score 0 · Answer 7 · answered Jan 25 '22 at 10:07

how do I use awk to print starting at pattern, end at another pattern then exit?

If you are interestingly solely in first range, then just exit at first occurence of closing pattern, let file.txt content be

then

awk '/50000/,/60000/{print}/60000/{exit}' file.txt

output

50000
55000
60000

Note that this code will end processing as fast as encountering first /60000/, which is useful if you have huge file and are interested in first range which is placed near start.

(tested in gawk 4.2.1)

how do I use awk to print starting at pattern, end at another pattern then exit?

7 Answers7