How to get number of fields in AWK prior to processing

Question

I would like to create a header for a file in the BEGIN part of my awk script, but to do that I need to know how many fields there are. I could put a check within the main section to check if NR==1 but that will get evaluated on each row, slowing things down.

Below is my attempt using a one-liner.

fields.txt

a   1
b   2
c   3

Result:

awk 'NR==1{a=NF; print "before begin, there are ", a, "fields"}BEGIN{print "there are ", a, "fields"}{print a"\t"$0}END{print "there were", a, "fields"}' fields.txt
there are   fields
before begin, there are  2 fields
2   a   1
2   b   2
2   c   3
there were 2 fields

I guess the BEGIN block still gets evaluated before the preceding block. Have I really accomplished my goal, or is the NR==1 check still getting evaluated on each line?

EDIT So just to put in perspective why I'm trying to do it the way I am

I've got a file with say 100k rows and 40 columns
This file is the output of another process in a pipeline, with the awk script being the last step
I'm calculating two rows based on other rows and adding these to the output
I want the final file to include a header that reflects the two new added columns

`BEGIN` occurs before the processing of the input, period. Technically a file doesn't have a number of fields (at least from awk's perspective) only a row. `NR==1` is the right way to go if you want the count of fields in the first row. — JNevill, Jan 03 '17 at 18:34
Is the `NR==1` going to be evaluated as each line is read, or just once? — abalter, Jan 03 '17 at 18:47
@karakfa -- each row is going to have the same number of fields, but I won't know that number in advance. I want to create a header that has the same number of fields as the rest of the rows. — abalter, Jan 03 '17 at 18:48
agree that BEGIN can't calc this number for you. While not "cool" awk, you can do `{ if (NR==1) { column_header stuff } else {record processing}} inFile` . This test is trivial in the overall processing. If you want to prove this wrong, you can spend 2-3 hours processing a terabyte sized file where you have hardcoded the header, versus the `NR==1` test. If it takes more than an extra minute I'll buy you a soda ;-) Good luck. — shellter, Jan 03 '17 at 19:00
`NR==1` test will be evaluated for each line but I doubt there will be a noticeable difference if there is other processing. On the other hand, if you're not processing there is no point of doing this in `awk`. — karakfa, Jan 03 '17 at 19:00

Ed Morton · Accepted Answer · 2017-01-03T19:15:26.023

It sounds like this is what you're trying to do:

awk '
  BEGIN {if ((getline < ARGV[1]) > 0) a=NF; print "there are", a, "fields"}
  {print a"\t"$0}
  END {print "there were", a, "fields"}
' file
there are 2 fields
2       a   1
2       b   2
2       c   3
there were 2 fields

but idk if it's worthwhile given the tiny performance impact of an NR==1 check relative to whatever other transformations you're going to perform on the data.

Make sure you read and fully understand all of the implications of using getline at http://awk.freeshell.org/AllAboutGetline if you're considering using it.

score 2 · Answer 2 · answered Jan 03 '17 at 18:39

I'm not sure if awk doing the NR==1 check on each row would really slow it down much at all. If that really is a concern, then perhaps do your initial field count outside of your current awk script and send it into your awk script with a variable. Something like:

fieldCount=`head -1 fields.txt | awk '{print NF}'`
awk -v a="$fieldCount" 'BEGIN{print "there are ", a, "fields"}{print a"\t"$0}END{print "there were", a, "fields"}' fields.txt

How to get number of fields in AWK prior to processing

2 Answers2

Linked