-3

I'm missing something about awk pattern matching a using flags --

Given a file:

2019 foo
 a
 b
 c
2019 bar
 d
 e
 f
2019 foobar
 g
 h
 i

I can use awk with flags and get the expected output -- awk '/foo/{flag=1;next} /^[0-9]+/{flag=0} flag' file

 a
 b
 c
 g
 h
 i

But if I exclude the next to include the matched pattern, then nothing is printed. Does awk continue from the matched line?

Using another syntax -- awk '/foo/,/2019/' file

2019 foo
2019 foobar

I was expecting awk to print between and including the match. I'm definitely missing something on syntax.

  • 1
    When you omit `next`, since the current line matches `^[0-9]+` flag is down before anything is printed. The range fails for the same reason as well – oguz ismail Dec 11 '19 at 21:43
  • 2
    The straightforward way of printing the line matching `foo` is to `print` it before `next`. – oguz ismail Dec 11 '19 at 21:45
  • for `/foo/,/2019/`, note that the same line matches for both start and end pattern. So you only get these two lines. I guess your intention is `/2019 foo/,/2019 foobar/` – karakfa Dec 11 '19 at 21:57
  • {flag=1;print;next} definitely works to print the match -- ty. – Brendan Stephens Dec 12 '19 at 14:55

1 Answers1

2

Don't use the range expression and then you don't have to try to understand them. A flag is always a better option anyway as it's clearer and easier to customize, see Is a /start/,/end/ range expression ever useful in awk?.

Your code using a flag should probably look like this though:

$ awk 'NF==2{found=0} found; $2 ~ /foo/{found=1}' file
 a
 b
 c
 g
 h
 i

or if you really like your original conditions:

$ awk '/^[0-9]/{found=0} found; /foo/{found=1}' file
 a
 b
 c
 g
 h
 i

Naming a flag variable flag is like naming a numeric variable number instead of whatever it really represents (total, average, etc.). Don't do that - name your flag variables based on what they represent about your data, in this case that you found the line matching your target regexp. When people abbreviate a flag variable to f, that's an abbreviation for found, not for flag.

Ed Morton
  • 188,023
  • 17
  • 78
  • 185
  • 1
    Ty Ed, this is good information. I was using `flag` because that's what I was setting -- a flag to mark a range. But I agree, it makes an easier read to call it something relatable. Using: `awk '/^[0-9]/{found=0} found; /foo/{found=1}'` I was wondering about the placement of the print statement. As above, it only prints the range... `awk '/^[0-9]/{found=0} /foo/{found=1} found'` But if it's moved to the end, then it also prints the match, which is what I was looking for, but still confuses the bejebus out of me. – Brendan Stephens Dec 12 '19 at 15:26
  • In my case I also cannot use NF because the actual data being parsed may contain a changing number of columns. – Brendan Stephens Dec 12 '19 at 15:29
  • If your real data doesn't follow the format shown in your question then you should fix your question. wrt being confused - you have `/^[0-9]/{f=0} /foo/{f=1} f{print}`. Just 3 separate condition/action blocks. The first 2 set/clear `f` and the 3rd one tests `f`. so when `f` is tested depends on where `f{print}` is located relative to the other 2 blocks and whether or not it's true depends on whether or not the conditions THEY are testing are true and so causing `f` to get set/cleared. Take a look at the examples in https://stackoverflow.com/a/17914105/1745001, maybe that'll help. – Ed Morton Dec 12 '19 at 15:49