The following command outputs the header of a file and sorts the records after the header. But how does it work? Can anyone explain this command?
awk 'NR == 1; NR > 1 {print $0 | "sort -k3"}'
The following command outputs the header of a file and sorts the records after the header. But how does it work? Can anyone explain this command?
awk 'NR == 1; NR > 1 {print $0 | "sort -k3"}'
Could you please go through following once(only for explanation purposes). For learning more concepts on awk
I suggest go through Stack overflow's nice awk learning section
awk ' ##Starting awk program from here.
NR == 1; ##Checking if line is first line then print it.
##awk works on method of condition then action since here is NO ACTION mentioned so by default printing of current line will happen
NR > 1{ ##If line is more than 1st line then do following.
print $0 | "sort -k3" ##It will be keep printing lines into memory and before printing it will sort them with their 3rd field.
}'
Understanding the awk command:
Overall an awk program is build out of (pattern){action}
pairs which stat that if pattern
returns a non-zero value, action
is executed. One does not necessarily, need to write both. If pattern
is omitted, it defaults to 1
and if action
is omitted, it defaults to print $0
.
When looking at the command in question:
awk 'NR == 1; NR > 1 {print $0 | "sort -k3"}'
We notice that there are two action-pattern pairs. The first reads NR == 1
and states that if we are processing the first record (pattern) then print the record (default action). The second is a bit more tricky. The pattern is clear, the action on the other hand needs some explaining.
awk has knowledge of 4 output statements that can redirect the output. One of these reads expression | cmd
. It essentially means that awk will write output to a stream that is piped as input to a command cmd
. It will keep on writing the output to that stream until the stream is explicitly closed using a close(cmd)
statement or by simply terminating awk.
In case of the OP, the action reads { print $0 | "sort -k3" }
, meaning that it will print all records $0
to a stream that is used as input of the shell command sort -k3
. Only when the program finishes will sort
write its output.
Recap: the command of the OP will print the first line of a file, and sort the consecutive lines according the third column.
Alternative solutions:
Using GNU awk, it is better to do:
awk '(FNR==1);{a[$3]=$0}
END{PROCINFO["sorted_in"]="@ind_str_asc"
for(i in a) print a[i]
}' file
Using pure shell, it is better to do:
cat file | (read -r; printf "%s\n" "$REPLY"; sort -k3)
Related questions:
|
is one of redirections supported by print and printf - in this case pipe to command sort -k3
. You might also use redirection to write to file using >
:
awk 'NR == 1; NR > 1 {print $0 > "output.txt"}'
or append to file using >>
:
awk 'NR == 1; NR > 1 {print $0 >> "output.txt"}'
First will write to file output.txt
all lines but first, second will append to output.txt
all lines but first.