1

I recently came across the following answer which was very useful but gave no context about why the commands worked.

awk '/matched/,0' file

What does the 0 mean in the context of this awk command?

To expand on this, I'd like to understand if the literal string 0 has some special meaning in the context of the comma operator, and whether it has special meaning in other places in awk.

For example, awk '/matched/,1' file seems to have the same behavior as awk '/matched/' file, which is just to match lines that have matched in them.

The documentation I found seems to make no mention of 0 when used as a substitute for a pattern.

merlin2011
  • 71,677
  • 44
  • 195
  • 329
  • Isn't the title self-explanatory? "*Print lines in file from the match line until end of file*". `0` makes the range until the end of file. – Wiktor Stribiżew Dec 06 '19 at 00:20
  • @WiktorStribiżew, I'd like a little more context or a link to documentation. A search for the comma operator in awk did not turn up anything. If `0` means end of file, what does `1` or `-1` mean? The only documented way I know of doing this is: https://unix.stackexchange.com/a/372108/34334 – merlin2011 Dec 06 '19 at 00:23

2 Answers2

3

<condition 1>,<condition 2> in awk and other tools is a "range expression" which means "match the set of lines starting when condition 1 is true and ending when condition 2 is true".

0 is a false condition so it's never true so that block of lines continues til the end of the file.

Your specific range expression is:

/matched/,0

which is awk shorthand for:

match the lines starting when the condition $0 ~ /matched/ is true and ending when the condition 0 is true (i.e. never so the end of the file).

Don't ever use range expressions, they make trivial tasks slightly briefer than using a flag but then anything slightly more interesting requires a complete rewrite or duplicate conditions. See Is a /start/,/end/ range expression ever useful in awk? for details.

Ed Morton
  • 188,023
  • 17
  • 78
  • 185
  • Is this knowledge that you happened to know, or is it documented somewhere that these are conditions rather than just patterns? – merlin2011 Dec 06 '19 at 00:48
  • There's no such thing as a pattern (yes, I know all the awk books, etc. use that term but they're wrong). An awk script is made up of ` { }` segments. A condition can be a keyword like BEGIN, or a test against a regexp like `/a.*b/` or `$0 ~ /a.*b/`, or a string comparison like `$0 == "foo"`, or a hard-coded value like 0 for false or non-zero for true. But it doesn't matter - nowhere in there is the word "pattern" used and it's all just testing conditions just like inside an `if () { }` statement. – Ed Morton Dec 06 '19 at 00:50
  • [GNU Awk User Guide on ranges](https://www.gnu.org/software/gawk/manual/html_node/Ranges.html#Ranges) – KamilCuk Dec 06 '19 at 00:52
  • @merlin2011, yes please it is documented, I just posted a small snippet of it too, cheers :) – RavinderSingh13 Dec 06 '19 at 01:07
1

Though Ed Sir has explained it well about condition is FALSE, adding it with an example here.

Let's say we have following Input_file:

cat Input_file
test test test test test
test test test test test
>Cluster 145
0       4772nt, >CL1798.Contig5_All... at +/98.49%
1       4782nt, >CL1798.Contig8_All... *
2       4781nt, >CL1798.Contig10_All... at +/99.27%
3       4773nt, >CL1798.Contig11_All... at +/99.25%

Now we will try OP's given command:

awk '/>Cluster 145/,0' Input_file
>Cluster 145
0       4772nt, >CL1798.Contig5_All... at +/98.49%
1       4782nt, >CL1798.Contig8_All... *
2       4781nt, >CL1798.Contig10_All... at +/99.27%
3       4773nt, >CL1798.Contig11_All... at +/99.25%

Now to make it more sense lets intentionally provide a FALSE condition which never gets TRUE in whole Input_file for example (where this is checking from a line which has string />Cluster 145/ to singh but later string is never existing in Input_file:

awk '/>Cluster 145/,/singh/' Input_file
>Cluster 145
0       4772nt, >CL1798.Contig5_All... at +/98.49%
1       4782nt, >CL1798.Contig8_All... *
2       4781nt, >CL1798.Contig10_All... at +/99.27%
3       4773nt, >CL1798.Contig11_All... at +/99.25%

And we are seeing the same result what we got during mentioning 0 at end of the condition. So hence 0 means we are making condition FALSE which is never getting matched till END of Input_file and thus whole Input_file itself printing.



From gawk documentation: See complete part in documentation about Range specification. For turning off the RANGE a pattern should be matched which is never happening in case of /matced,0, see highlighted.

awk '$1 == "on", $1 == "off"' myfile

prints every record in myfile between ‘on’/‘off’ pairs, inclusive. A range pattern starts out by matching begpat against every input record. When a record matches begpat, the range pattern is turned on, and the range pattern matches this record as well. As long as the range pattern stays turned on, it automatically matches every input record read. The range pattern also matches endpat against every input record; when this succeeds, the range pattern is turned off again for the following record. Then the range pattern goes back to checking begpat against each record.

RavinderSingh13
  • 130,504
  • 14
  • 57
  • 93