0

I've just read Can awk patterns match multiple lines?, whose accepted solution is a script that print the line after first half.

how
second half #1
now
first half
second half #2
brown
second half #3
cow
/second half/ {
  if(lastLine == "first half") {
    print
  }
}

{ lastLine = $0 }

This gives second half #2.

I can't understand why { lastLine = $0 } has to go after /second half/ {...}/. I tried interchanging them, and I got nothing.

{ lastLine = $0 }

/second half/ {
  if(lastLine == "first half") {
    print
  }
}

I tried reading man awk, but it doesn't cover state machines. Searching "awk state machine" gives only the linked SO question.

  • 3
    The if statement refers to variable lastLine, so obviously it is matter whether lastLine gets its value before or after the if (Unless I didn't understand your question at all). – Eran Ben-Natan Apr 28 '19 at 06:44
  • In the 2nd script, `lastLine` gets the value before `if`, but I don't understand why there's nothing printed out. – GNUSupporter 8964民主女神 地下教會 Apr 28 '19 at 06:52
  • In the 2nd script, `lastLine` is set to the current line (i.e `$0`), and the action coming after it is executed only if current line matches `second half`. In the action, `lastLine` variable is checked if it is `first half`, which always results false because `second half ...` =/= `first half`. Thus nothing gets printed. – oguz ismail Apr 28 '19 at 07:12
  • @oguzismail Thanks for reponse, but I still don't get why that's different from the 1st script. In the 1st script, `lastLine` is also set to current line `$0$`. – GNUSupporter 8964民主女神 地下教會 Apr 28 '19 at 07:22
  • 1
    Look, in 1st script `lastLine` keeps *last line*, but for 2nd it keeps *current line* instead, okay? because in 1st, it is assigned *after* checking if it is `first half`, in 2nd, it is assigned *before* checking if it is `first half`. – oguz ismail Apr 28 '19 at 07:28
  • @oguzismail Thanks again for explanation. I hope I get you correctly: for the 1st script, at the 1st line `/.../{if(lastLine == "first half"){...}}` is omitted, then `lastLine` gets value `$0`, then repeat for second line, ..., until `/.../{if(...){...}}` is matched, so that `lastLine` gets value from the previous line. The examples about AWK variables that I've seen only involves one single action, so I've scratched my head on this. How can I learn more about this? I've gone through GNU's user manual and I can't find an example like this. – GNUSupporter 8964民主女神 地下教會 Apr 28 '19 at 07:55
  • by trial-error like everybody else – oguz ismail Apr 28 '19 at 08:20
  • 1
    How to write a state machine is a general programming thing, it's not an awk thing. See https://en.wikipedia.org/wiki/Finite-state_machine (and a very old paper by a very young man and his peers at https://ieeexplore.ieee.org/document/6772875 if you care :-) ) – Ed Morton Apr 28 '19 at 14:22

1 Answers1

0

This is answered by @oguzismail's comment. To clear this question from the unanswered queue, I'm going to expand it into an answer.

AWK processes text record-wise. By default, the record seperator (RS) is the newline character \n, so AWK treats each line as a record.

In the first (correct) AWK script, when the first record how is processed, the match /second half/ in

/second half/ {
  if(lastLine == "first half") {
    print
  }
}

is evaluated to false and { lastLine = $0 } saves the current record $0 (i.e. how) to the variable lastLine.

Then the second record second half #1 comes, and it matches /second half/, so the block {if (lastLine == "first half"){...}} is executed with lastLine as the previous record (how), even though $0 is second half #1.

As the process goes on, the record second half #2 will eventually get through the nested block {} so that it's printed.

If I invert the two actions in the AWK script, lastLine will always save the current record $0, while /second half/ get matched if and only if $0 contains second half, which doesn't equal to first half. Therefore, it's impossible that $0 get printed by the 2nd AWK script.