2

I have a set of files (FILE1.txt, FILE2.txt ...) of the form:

foo 123
bar 456
start
foo 321
bar 654

And I want to ignore everything before start and only read lines containing foo in each file.

My attempt is this command :

awk '/start/,/EOF/ {if($1=="foo"){print $2}} ' FILE*.txt

And it actually works on the first file, that is it will print foo 321 but then it will ignore the range pattern for the next files. That is, if we assume that all the files has the same content showed above, it will print:

$ awk '/start/,/EOF/ {if($1=="foo"){print $2}} ' FILE*.txt

321 // Expected from FILE1.txt, successfully ignore the first "foo" before "start".
123 // Unexpected from FILE2.txt
321 // Expected from FILE2.txt
123 // Unexpected from FILE3.txt
321 // Expected from FILE3.txt
...

What am I doing wrong ? How to make the range pattern working on each file and not only once over all the files? I've actually found a workaround based on find but for the sake of a good understanding I'm looking toward a solution relying on awk only.

purple
  • 133
  • 4

3 Answers3

2

awk processes all files as a single input stream. You need to tell awk when it's processing a new file and to reset it's pattern matching.

One approach:

awk '
FNR==1             { found=0 }          # FNR==1st record of new file, reset flag
/start/            { found=1 }          # found start of range, set flag
found && $1=="foo" { print $2 }         # if flag set and 1st field == "foo" then print 2nd field
' FILE?.txt

NOTES:

  • /start/ will match on the string start anywhere in the row, eg, it will match on restart, last time I started the car, etc; to match on the exact string you could use $1=="start"
  • this was run against 3 files (FILE{1..3}.txt) that all have the same content as OP's sample input

This generates:

321
321
321
markp-fuso
  • 28,790
  • 4
  • 16
  • 36
  • Thank you. However I don't understand why awk would evaluate /start/ many times in a single stream and /start/,/end/ only once. But actually it echoes what I've read here. WDYT ? : https://stackoverflow.com/questions/23934486/is-a-start-end-range-expression-ever-useful-in-awk – purple Nov 25 '22 at 14:49
  • 2
    your 'end of range' is based on finding the *string* `'EOF'` which does not exist anywhere in your input so the very first `'start'` found in the input stream tells `awk` to process the *rest of the input stream* with `{if($1=="foo"){print $2}}` – markp-fuso Nov 25 '22 at 14:52
1

What am I doing wrong ?

/EOF/ means line has three-letter substring EOF, it will not contact at last line of file unless that line contains substring EOF.

How to make the range pattern working on each file and not only once over all the files?

I would exploit GNU AWK following way, let file1.txt content be

foo 123
bar 456
start
foo 321
bar 654

and file2.txt content be

foo 1230
bar 4560
start
foo 3210
bar 6540

and file3.txt content be

foo 12300
bar 45600
start
foo 32100
bar 65400

then

awk '/start/{f=1}f&&$1=="bar"{print}ENDFILE{f=0}' file1.txt file2.txt file3.txt

gives output

bar 654
bar 6540
bar 65400

Explanation: for line containing start I set f value to 1, when f is non-zero and 1st column ($1) is bar I print that line, when I encounter end of file I set f to zero using ENDFILE special parking

(tested in GNU Awk 5.0.1)

Daweo
  • 31,313
  • 3
  • 12
  • 25
  • Thank you A LOT, now I understand my mistake, I thought EOF was some kind of built-in variable or regex that I copy-pasted somewhere. – purple Nov 25 '22 at 14:53
1

Some thoughts about row ranges and EOF.

One solution can be to set a helper variable.

$ awk -v row="start" -v regx="foo" '
    FNR == 1{x = 0}
    x == 1 && $1 ~ regx{print $2}
    $1 ~ row{x = 1}' file file file
321
321
321
Andre Wildberg
  • 12,344
  • 3
  • 12
  • 29