0

I have a file with the following data within for example:

20        V     70000003d120f88  1                            2
20        V     70000003d120f88  2                            2
20x00     V     70000003d120f88  2                            2
10020     V     70000003d120f88  1                            5

I want to get the sum of the 4th column data.

Using the the below command, I can acheive this, however the row 20x00 is excluded. I want to everything to start with 20 must be sumed and nothing before that, so 20* for example:

cat testdata.out | awk '{if ($1 == '20') print $4;}' | awk '{s+=$1}END{printf("%.0f\n", s)}'

The output value must be:

5

How can I achieve this using awk. The below I attempted also does not work:

cat testdata.out | awk '$1 ~ /'20'/ {print $4;}' | awk '{s+=$1}END{printf("%.0f\n", s)}' 
Christopher Karsten
  • 387
  • 1
  • 2
  • 12

2 Answers2

5

There is no need to use 3 processes, anything can be done by one AWK process. Check it out:

awk '$1 ~ /^20/ { a+=$4 } END { print a }'  testdata.out

explanation:

$1 ~ /^20/   checks to see if $1 starts with 20
if yes, we add $4 in the variable a
finally, we print the variable a

result 5

EDIT:

Ed Morton rightly points out that the result should always be of the same type, which can be solved by adding 0 to the result. You can set the exit status if it is necessary to distinguish whether the result 0 is due to no matches (output status 0) or matching only zero values ​​(output status 1). The exit code for different input data can be checked e.g. echo $? The code would look like this:

awk '$1 ~ /^20/ { a+=$4 } END { print a+0; exit(a!="") }'  testdata.out
Slawomir Dziuba
  • 1,265
  • 1
  • 6
  • 13
  • 1
    `print a+0` so you get `0` instead of a blank line output if/when no lines match `/^20/`. – Ed Morton Mar 01 '21 at 16:35
  • @Ed-Morton generally yes. But we lose the distinction as to whether the hits were $4=0 only or nothing was hit. Depends on what is needed. – Slawomir Dziuba Mar 01 '21 at 17:46
  • 1
    A command that produces numeric output should always produce numeric output. For reference on standard/expected behavior from such scripts try `seq 3 | grep -c 2` vs `seq 3 | grep -c 5` and note that the output is always numeric whether a match was found or not. You could set an appropriate exit status by adding `; exit (a!="")` after the print if you want a found something (success) vs found nothing (failure) exit status like grep also provides. – Ed Morton Mar 01 '21 at 18:12
-4

Figured it out:

cat testdata.out | awk '$1 ~ /'^20'/ {print $4;}' | awk '{s+=$1}END{printf("%.0f\n", s)}' 

The above might not work for all cases, but below will suffice:

i=20
cat testdata.out | awk '{if ($1 == "'"$i"'" || $1 == ""'"${i}"'"x00") print $4;}' | awk '{s+=$1}END{printf("%.0f\n", s)}'
Christopher Karsten
  • 387
  • 1
  • 2
  • 12
  • 4
    You shouldn't need to pipe cat through to awk and then back into awk again, – Raman Sailopal Mar 01 '21 at 11:54
  • And that is not how you give awk access to the value of shell variables, see [how-do-i-use-shell-variables-in-an-awk-script](https://stackoverflow.com/questions/19075671/how-do-i-use-shell-variables-in-an-awk-script). – Ed Morton Mar 01 '21 at 16:36