2

The following command outputs the header of a file and sorts the records after the header. But how does it work? Can anyone explain this command?

awk 'NR == 1; NR > 1 {print $0 | "sort -k3"}'
kvantour
  • 25,269
  • 4
  • 47
  • 72

3 Answers3

2

Could you please go through following once(only for explanation purposes). For learning more concepts on awk I suggest go through Stack overflow's nice awk learning section

awk '                    ##Starting awk program from here.
NR == 1;                 ##Checking if line is first line then print it.
##awk works on method of condition then action since here is NO ACTION mentioned so by default printing of current line will happen
NR > 1{                  ##If line is more than 1st line then do following.
 print $0 | "sort -k3"   ##It will be keep printing lines into memory and before printing it will sort them with their 3rd field.
}'
RavinderSingh13
  • 130,504
  • 14
  • 57
  • 93
2

Understanding the awk command: Overall an awk program is build out of (pattern){action} pairs which stat that if pattern returns a non-zero value, action is executed. One does not necessarily, need to write both. If pattern is omitted, it defaults to 1 and if action is omitted, it defaults to print $0.

When looking at the command in question:

awk 'NR == 1; NR > 1 {print $0 | "sort -k3"}'

We notice that there are two action-pattern pairs. The first reads NR == 1 and states that if we are processing the first record (pattern) then print the record (default action). The second is a bit more tricky. The pattern is clear, the action on the other hand needs some explaining.

awk has knowledge of 4 output statements that can redirect the output. One of these reads expression | cmd . It essentially means that awk will write output to a stream that is piped as input to a command cmd. It will keep on writing the output to that stream until the stream is explicitly closed using a close(cmd) statement or by simply terminating awk.

In case of the OP, the action reads { print $0 | "sort -k3" }, meaning that it will print all records $0 to a stream that is used as input of the shell command sort -k3. Only when the program finishes will sort write its output.

Recap: the command of the OP will print the first line of a file, and sort the consecutive lines according the third column.

Alternative solutions:

Using GNU awk, it is better to do:

awk '(FNR==1);{a[$3]=$0}
     END{PROCINFO["sorted_in"]="@ind_str_asc"
        for(i in a) print a[i]
     }' file

Using pure shell, it is better to do:

cat file | (read -r; printf "%s\n" "$REPLY"; sort -k3)

Related questions:

kvantour
  • 25,269
  • 4
  • 47
  • 72
0

| is one of redirections supported by print and printf - in this case pipe to command sort -k3. You might also use redirection to write to file using >:

awk 'NR == 1; NR > 1 {print $0 > "output.txt"}'

or append to file using >>:

awk 'NR == 1; NR > 1 {print $0 >> "output.txt"}'

First will write to file output.txt all lines but first, second will append to output.txt all lines but first.

Daweo
  • 31,313
  • 3
  • 12
  • 25