1

I'm studying awk pretty fiercely to write a git diffn implementation which will show line numbers for git diff, and I want confirmation on whether or not this Wikipedia page on awk is wrong [Update: I've now fixed this part of that Wikipedia page, but this is what it used to say]:

(pattern)
{
   print 3+2
   print foobar(3)
   print foobar(variable)
   print sin(3-2)
}

Output may be sent to a file:

(pattern)
{
   print "expression" > "file name"
}

or through a pipe:

(pattern)
{
   print "expression" | "command"
}

Notice (pattern) is above the opening brace. I'm pretty sure this is wrong but need to know for certain before editing the page. What I think that page should look like is this:

/regex_pattern/ {
    print 3+2
    print foobar(3)
    print foobar(variable)
    print sin(3-2)
}

Output may be sent to a file:

/regex_pattern/ {
    print "expression" > "file name"
}

or through a pipe:

/regex_pattern/ {
    print "expression" | "command"
}

Here's a test to "prove" it. I'm on Linux Ubuntu 18.04.

1. test_awk.sh

gawk \
'
BEGIN
{
    print "START OF AWK PROGRAM"
}
'

Test and error output:

$ echo -e "hey1\nhello\nhey2" | ./test_awk.sh
gawk: cmd. line:3: BEGIN blocks must have an action part

But with this:

2. test_awk.sh

gawk \
'
BEGIN {
    print "START OF AWK PROGRAM"
}
'

It works fine!:

$ echo -e "hey1\nhello\nhey2" | ./test_awk.sh
START OF AWK PROGRAM

Another example (fails to provide expected output):

3. test_awk.sh

gawk \
'
/hey/ 
{
    print $0
}
'

Erroneous output:

$ echo -e "hey1\nhello\nhey2" | ./test_awk.sh
hey1
hey1
hello
hey2
hey2

But like this:

4. test_awk.sh

gawk \
'
/hey/ {
    print $0
}
'

It works as expected:

$ echo -e "hey1\nhello\nhey2" | ./test_awk.sh
hey1
hey2

Updates: after solving this problem, I just added these sections below:

Learning material:

  1. In the process of working on this problem, I just spent several hours and created these examples: https://github.com/ElectricRCAircraftGuy/eRCaGuy_hello_world/tree/master/awk. These examples, comments, and links would prove useful to anyone getting started learning awk/gawk.

Related:

  1. git diff with line numbers and proper code alignment/indentation
  2. "BEGIN blocks must have an action part" error in awk script
  3. The whole point of me learning awk at all in the first place was to write git diffn. I just got it done: Git diff with line numbers (Git log with line numbers)
Gabriel Staples
  • 36,492
  • 15
  • 194
  • 265
  • 1
    If you google that error a lot of hits, like https://stackoverflow.com/questions/27776583/begin-blocks-must-have-an-action-part-error-in-awk-script – Jetchisel May 23 '20 at 22:17
  • 1
    There's a lot wrong with and a lot missing from that wikipedia page (and ditto for most other online resources), ignore it and just get the book Effective Awk Programming, 4th Edition, by Arnold Robbins if you want to learn AWK. But in all cases, any time you see `pattern { action }` it should be `condition { action }` instead since, for example `x == 3` is a condition, not a "pattern". – Ed Morton May 24 '20 at 02:27
  • Nah, I'm good. I got the job done I needed to get done, and I'm done with `awk` for a while. This was my purpose of learning `awk` at all this week: https://stackoverflow.com/questions/24455377/git-diff-with-line-numbers-git-log-with-line-numbers/61997003#61997003. Also, I recommend people, myself included, take the time to fix Wikipedia as able. I'll keep this tool handy for sure for future projects though, and may look into that book in the future if I ever require it. That being said, I dunno, 5 after-work days and 30 hrs into `awk`, I feel pretty competent to do the basics now as it is. – Gabriel Staples May 25 '20 at 07:40
  • On 2nd thought, I may have sounded kind of arrogant. I want to let you know: thank you for the book reference. I actually bookmarked it when I fist saw your comment the other day, and added it to my Amazon wishlist so I can find it later. I don't think the internet material to learn awk is insufficient: I just did it enough to feel good about my accomplishments. However, you are right: a well-written book **dramatically** speeds up the learning process in general over using just the internet. That's one reason why I work so hard to make the internet as good as a good book. – Gabriel Staples May 25 '20 at 07:50

2 Answers2

4

I agree with you that the Wikipedia page is wrong. It's right in the awk manual:

A pattern-action statement has the form

     pattern { action }

A missing { action } means print the line; a missing pattern always matches. Pattern-action statements are separated by newlines or semicolons.

...

Statements are terminated by semicolons, newlines or right braces.

This the man page for the default awk on my Mac. The same information is in the GNU awk manual, it's just buried a little deeper. And the POSIX specification of awk states

An awk program is composed of pairs of the form:

pattern { action }

Either the pattern or the action (including the enclosing brace characters) can be omitted.

A missing pattern shall match any record of input, and a missing action shall be equivalent to:

{ print }
glenn jackman
  • 238,783
  • 38
  • 220
  • 352
  • this is also not strictly correct `awk '/foo/{print 1} /bar/{print 2}'` is a valid script. However, if the first action or second pattern were missing a semicolon is needed in between. ` – karakfa May 23 '20 at 23:37
  • @karakfa I've carefully read the POSIX specification and I couldn't reach a conclusion as to if `awk '/foo/{print 1} /bar/{print 2}'` is allowed. Can you point me how you concluded that? – Quasímodo May 24 '20 at 00:51
  • @Quasímodo: Well, many years of experience... Just try it yourself. As long as parser can tell where one statement ends and where the next condition/pattern starts it's unambiguous syntax. I'm not sure which `awk`s support this but I think this is now de facto behavior. – karakfa May 24 '20 at 01:21
  • I've fixed the Wikipedia page. Thanks all. – Gabriel Staples May 24 '20 at 01:36
  • I missed an important bit about statement terminators that covers the one-liner case – glenn jackman May 24 '20 at 01:50
  • A bit of ambiguity there about which statements are statements. – glenn jackman May 24 '20 at 02:05
  • @karakfa I completely agree with you that it is reasonable and unambiguous, however `sed '/pattern/{H;p}' file` is not POSIX-compliant. A semicolon or newline between `p` and `}` is required, even though it is unambiguous and our parsers understand the statement as we do. – Quasímodo May 24 '20 at 12:23
3

You can see in you examples that instead of semicolons at the end of statements you can separate them with new lines. When you have

/regex/
{ ...
}

it's equivalent to /regex/; {...} which is equal to /regex/{print $0} {...} as you tested the behavior.

Note that BEGIN and END are special markers and they need action statements explicitly since for BEGIN {print $0} is not possible as the default action. That's why the open curly brace should be on the same line. Perhaps due to convenience but it's all consistent.

karakfa
  • 66,216
  • 7
  • 41
  • 56
  • I see, and `/regex/{print $0} {...}` is equivalent to `/regex/{print $0}; /.*/{...}`, so the first part (`/regex/{print $0};`) prints if it matches the `regex` regular expression, and the 2nd part (`/.*/{...}`) just prints everything, since it matches everything, since that's the default behavior for awk if you don't specify a matcher. I read that here: https://www.gnu.org/software/gawk/manual/html_node/Very-Simple.html. "If the pattern is omitted, then the action is performed for _every_ input line." That explains why my 3rd example prints `hey1 hey1 hello hey2 hey2`. – Gabriel Staples May 23 '20 at 22:38
  • 3
    the part before the action can be a regex or a boolean condition. Usually, it's considered as `1{...}` instead of all match regex. That's why as a convention a standalone `1` is used for printing the record unconditionally. – karakfa May 23 '20 at 22:41
  • Thanks. To answer the Wikipedia page question: it sounds like you are agreeing with me. Can you confirm the Wikipedia page is wrong, and my suggested edit would fix it? – Gabriel Staples May 23 '20 at 22:44
  • 1
    Yes, it's wrong. Someone pretty printed the scripts I guess... – karakfa May 23 '20 at 23:33