0

The following works well and captures all 2nd column values for S_nn. The goal is to add numbers in the 2nd column.

awk -F "," '/s_/ {cons = cons + $2} END {print cons}' G.csv

How can I change this to add only when nnn is between N1 and N2 e.g. s_23 and s_24?

Also is it possible to consider 1 if a line has junk instead of numbers in the 2nd column?

S_22, 1
S_23, 0
S_24, 1
S_25, 1
S_26, ?

Sample input: sum s_24 to s_26

Sample output: 1+1+1=3 (the last one is for error)

Tims
  • 627
  • 7
  • 19

1 Answers1

3

The solution is rather simple, all you need to do is perform a simple numeric test.

awk -v start=24 -v stop=26 '
     BEGIN { FS="[_,]" }
     (start <= $2 ) && ($2 <= stop) { s = s + (($3==$3+0)?$3:1) }
     END{ print s+0 }' <file>

which outputs

3

How does it work:

  • line 1 : defines the start and stop fields
  • BEGIN statement redefines the field separator as a _ or a ,, so now we have 3 fields.
  • the second line checks if field 2 (the number) is between start and stop, if so perform the sum.
  • the field 3 is checked if it is a number by testing the condition $3==$3+0, if this fails, it is assumed to be 1

If you want to see the numbers printed, you can do :

awk -v start=24 -v stop=26 '
     BEGIN{ FS="[_,]" }
     (start <= $2 ) && ($2 <= stop) {
        v = ($3==$3+0)?$3:1
        s = s + v
        printf "%s%d", (c++?"+":""), v
     }
     END{ printf "=%d\n", s }' <file>

output :

1+1+1=3

The printf statement always prints "+"$3 except on the first time. This is checked by keeping track of a counter c. By default the value of c is set to zero. The entry (c++?"+":"") determines if we are printing the first entry or not. c++ will return the value of c and afterwards sets c to the value c+1, This is called a post increment operator. Thus, the first time, c=0 and (c++?"+":"") returns "" and sets c to 1. The second time, (c++?"+":"") returns "+" and sets c to 2.

kvantour
  • 25,269
  • 4
  • 47
  • 72