AWK is an interpreted programming language designed for text processing and typically used as a data extraction and reporting tool. AWK is used largely with Unix systems.
AWK is an interpreted programming language (AWK stands for Aho, Weinberger, and Kernighan) designed for text processing and typically used as data extraction and reporting tool. It is a standard feature of most Unix-like operating systems.
Source: Wikipedia.
An awk program is a series of pattern-action pairs, written as:
condition { action }
condition { action }
...
where condition is typically an expression and action a series of one or more commands, separated by a semi-colon ;
character. The input is split into records, and each record is split into fields (by default, records are separated by the newline character and fields by horizontal whitespace.) Per record, each condition is checked and, if true, the commands in the action block are executed. Within the action block, fields are accessed by a 1-based index – e.g. $2
for the second field. If the condition is missing, the action block will always be executed. If the condition is present but the action block is absent, the default action is print $0
which is to print the current line after any transformations. Since a non-zero number is equivalent to true, then awk '1' file
instructs awk to perform the default action (print
) for every line.
Awk can have an optional BEGIN
and optional END
, where the BEGIN
action is invoked before reading any input, and END
action is invoked after all input is read:
BEGIN { action }
condition { action }
condition { action }
...
END { action }
Awk was originally developed by Alfred Aho, Brian Kernighan and Peter Weinberger in 1977 and updated in 1985. Since then, various versions and dialects of awk have emerged. The most common are :
- awk - the most common and will be found on most Unix-like systems. It also has a well defined IEEE standard.
- mawk - a fast AWK implementation which it's code base is based on a byte-code interpreter.
- nawk - during the development of AWK, the developers released a new version (new awk) to avoid confusion but it is itself now very old and lacking functionality present in all POSIX awks.
- gawk - Also known as GNU awk. The only version in which the developers attempted to add i18n support. Allowed users to write their own C shared libraries to extend it with their own "plug-ins". This version is the standard implementation for Linux.
When asking questions about data processing using awk, please include complete input and desired output.
Some frequently occurring themes:
- Parsing a CSV file using gawk
- How do I use shell variables in an awk script?
- Printing only the first field in a string
- How to print lines between two patterns, inclusive or exclusive (in sed, AWK or Perl)?
- Why does my tool output overwrite itself and how do I fix it?
- Printing with sed or awk a line following a matching pattern
- AllAboutGetline
- How to remove a field from a line
Books:
- The AWK Programming Language by Aho, Kernighan & Weinberger (archive.org link)
- Effective AWK, 4th edition by Robbins (see The GNU AWK Users Guide below for latest online version)
- Effective AWK, 3rd edition by Robbins
- Sed & Awk, 2nd edition by Dougherty & Robbins
- Sed & Awk Pocket Reference, 2nd Edition by Arnold Robbins
- AWK Language Programming - free book
- Awk One-Liners Explained
- GNU AWK one-liners by Sundeep Agarwal (includes a chapter on regular expressions)
Resources:
- Awk.Info (archive.org link)
- The GNU Awk User's Guide
- POSIX specification of awk
- Idiomatic awk
- The awk programming language tutorial site
- Awk one-liners
- Awk one-liners explained
Other StackExchange Resources:
- tagged awk on Unix & Linux
- tagged awk on Ask Ubuntu
- tagged awk on Super User
- tagged awk on Server Fault