0

I would like to create a header for a file in the BEGIN part of my awk script, but to do that I need to know how many fields there are. I could put a check within the main section to check if NR==1 but that will get evaluated on each row, slowing things down.

Below is my attempt using a one-liner.

fields.txt

a   1
b   2
c   3

Result:

awk 'NR==1{a=NF; print "before begin, there are ", a, "fields"}BEGIN{print "there are ", a, "fields"}{print a"\t"$0}END{print "there were", a, "fields"}' fields.txt
there are   fields
before begin, there are  2 fields
2   a   1
2   b   2
2   c   3
there were 2 fields

I guess the BEGIN block still gets evaluated before the preceding block. Have I really accomplished my goal, or is the NR==1 check still getting evaluated on each line?

EDIT So just to put in perspective why I'm trying to do it the way I am

  1. I've got a file with say 100k rows and 40 columns
  2. This file is the output of another process in a pipeline, with the awk script being the last step
  3. I'm calculating two rows based on other rows and adding these to the output
  4. I want the final file to include a header that reflects the two new added columns
abalter
  • 9,663
  • 17
  • 90
  • 145
  • 1
    It is just not possible within the `BEGIN` block – Inian Jan 03 '17 at 18:32
  • 1
    `BEGIN` occurs before the processing of the input, period. Technically a file doesn't have a number of fields (at least from awk's perspective) only a row. `NR==1` is the right way to go if you want the count of fields in the first row. – JNevill Jan 03 '17 at 18:34
  • don't you want to check each row for number of fields? – karakfa Jan 03 '17 at 18:42
  • Is the `NR==1` going to be evaluated as each line is read, or just once? – abalter Jan 03 '17 at 18:47
  • @karakfa -- each row is going to have the same number of fields, but I won't know that number in advance. I want to create a header that has the same number of fields as the rest of the rows. – abalter Jan 03 '17 at 18:48
  • agree that BEGIN can't calc this number for you. While not "cool" awk, you can do `{ if (NR==1) { column_header stuff } else {record processing}} inFile` . This test is trivial in the overall processing. If you want to prove this wrong, you can spend 2-3 hours processing a terabyte sized file where you have hardcoded the header, versus the `NR==1` test. If it takes more than an extra minute I'll buy you a soda ;-) Good luck. – shellter Jan 03 '17 at 19:00
  • `NR==1` test will be evaluated for each line but I doubt there will be a noticeable difference if there is other processing. On the other hand, if you're not processing there is no point of doing this in `awk`. – karakfa Jan 03 '17 at 19:00

2 Answers2

3

It sounds like this is what you're trying to do:

awk '
  BEGIN {if ((getline < ARGV[1]) > 0) a=NF; print "there are", a, "fields"}
  {print a"\t"$0}
  END {print "there were", a, "fields"}
' file
there are 2 fields
2       a   1
2       b   2
2       c   3
there were 2 fields

but idk if it's worthwhile given the tiny performance impact of an NR==1 check relative to whatever other transformations you're going to perform on the data.

Make sure you read and fully understand all of the implications of using getline at http://awk.freeshell.org/AllAboutGetline if you're considering using it.

Ed Morton
  • 188,023
  • 17
  • 78
  • 185
2

I'm not sure if awk doing the NR==1 check on each row would really slow it down much at all. If that really is a concern, then perhaps do your initial field count outside of your current awk script and send it into your awk script with a variable. Something like:

fieldCount=`head -1 fields.txt | awk '{print NF}'`
awk -v a="$fieldCount" 'BEGIN{print "there are ", a, "fields"}{print a"\t"$0}END{print "there were", a, "fields"}' fields.txt
JNevill
  • 46,980
  • 4
  • 38
  • 63