3

I might be going about this the wrong way but I have tried every syntax and I am stuck on the closest error I could get to.

I have a log file, in which I want to filter to a set of lines like so:

Files :  1  1  1  1  1
Files :  3  3  4  4  5
Files :  10 4  2  3  1
Files : 254 1  1  1  1

The code I have will get me to this point, however, I want to use awk to perform addition of all of the first numeric column, in this instance giving 268 as the output (then performing a similar task on the other columns).

I have tried to pipe the awk output into a loop to perform the final step, but it won't add the values, throwing an error. I thought it could be due to awk handling the entries as a string, but as bash isn't strongly typed it should not matter?

Anyway, the code is:

 x=0; 
 iconv -f UTF-16 -t UTF-8 "./TestLogs/rbTest.log" | grep "Files :" | grep -v "*.*" | egrep -v "Files : [a-zA-Z]" |awk '{$1=$1}1' OFS="," | awk -F "," '{print $4}' | while read i;
 do
    $x=$((x+=i)); 
done

Error message:

-bash: 0=1: command not found
-bash: 1=4: command not found
-bash: 4=14: command not found
-bash: 14=268: command not found

I tried a couple of the different addition syntaxes but I feel this has something to do with what I am trying to feed it than the addition itself. This is currently just with integer values but I would also be looking to perform it with floats as well.

Any help much appreciated and I am sure there is a less convoluted way to achieve this, still learning.

Craig155
  • 83
  • 7
  • 2
    You assign to variables like this: `x=$((x+=i))`. That said this won't work because your assignment is in a sub-shell. See http://stackoverflow.com/q/16854280/258523 for more on that. – Etan Reisner Sep 01 '15 at 20:01
  • I thought this was something akin to that, I have performed similar things in PoSH and I usually cheat, asigning $_ to a temp variable in the next pipe iteration. I wasnt sure how to acomplish that in Bash - thanks for the link, it is very helpful! – Craig155 Sep 01 '15 at 20:39

2 Answers2

6

You can do computations in awk itself:

awk '{for (c=3; c<=NF; c++) sum[c]+=$c} END{printf "Total : ";
    for (c=3; c<=NF; c++) printf "%s%s", sum[c], ((c<NF)? OFS:ORS) }' file

Output:

Total : 268 9 8 9 8

Here sum is an associative array that holds sum for each column from #3 onwards.

Command breakup:

for (c=3; c<=NF; c++)     # Iterate from 3rd col to last col
sum[c]+=$c                # Add each col value into an array sum with index of col #
END                       # Execute this block after last record
printf "Total : "         # Print literal "Total : "
for (c=3; c<=NF; c++)     # Iterate from 3rd col to last col
printf "%s%s",            # Use printf to format the output as 2 strings (%s%s)
sum[c],                   # 1st one is sum for the given index
((c<NF)? OFS:ORS)         # 2nd is conditional string. It will print OFS if it is not last
                          # col and will print ORS if it is last col.
anubhava
  • 761,203
  • 64
  • 569
  • 643
  • 1
    Ok this is really cool, not sure I get all of it, but it works perfectly. So the first part directs to start at the 3rd column and keep moving over until the field limit, at each field, sum the entries into the array which also moves across in the loop. Then go to print the values, also moving across the array in the loop. (dont get the %s%s - formatting?) the last part has me totally perplexed. Could you walk me through please? – Craig155 Sep 01 '15 at 20:35
  • 1
    I've added detailed explanation in my answer. – anubhava Sep 02 '15 at 07:25
4

(Not an answer, but a formatted comment)

I always get antsy when I see a long pipeline of greps and awks (and seds, etc)

... | grep "Files :" | grep -v "*.*" | egrep -v "Files : [a-zA-Z]" | awk '{$1=$1}1' OFS="," | awk -F "," '{print $4}'

Can be written as

... | awk '/Files : [^[:alpha:]]/ && !/\*/ {print $4}'

Are you using grep -v "*.*" to filter out lines with dots, or lines with asterisks? Because you're achieving the latter.

glenn jackman
  • 238,783
  • 38
  • 220
  • 352
  • Oh dude I know, I wrote that thinking there must be a way to do this without so much piping, forgive me I am still learning the ropes and stringing things together is a start. I needed to get rid of the latter, so thankfully I got that right! I was trying to make sure it matched on only lines with 'Files :' + and number of spaces then a numeric only. I see this is a 2 condition statement, can you expalin the first portion as it is a new construct to me: [^[:alpha:]]/ – Craig155 Sep 01 '15 at 20:47
  • You know, what you're doing is perfectly valid. The "unix philosophy" is to use each tool for the purpose it's designed for, and chain them together with pipes. There's probably very little performance difference between your pipeline and mine. You're definitely on the right track. awk is a nice little programming language. I'd recommend you spend some time learning it. However this bit `awk '{$1=$1}1' OFS=, | awk -F, '{print $4}'` is one awk too many – glenn jackman Sep 01 '15 at 20:58
  • 2
    This thing `[^[:alpha:]]` is not awk-specific: it's just a regular expression negated character set -- it means "the next character is not an alphabetic character". It's what you're doing with `grep "Files : " | grep -v "Files : [a-zA-Z]"`. In addition to `[:alpha:]`, you have `[:digit:]`, `[:xdigit:]`, `[:alnum:]`, `[:space:]`, `[:blank:]`, `[:punct:]`etc. In addition to matching more than just ASCII characters, they give you a clear self-explanation about what you're matching (even if they can be a few more chars to type). – glenn jackman Sep 01 '15 at 20:58
  • oh, cheers man that is proper cool. I didnt realise you could write them like that. You've just mdee my grep/sed/awk'ing allot easier for the future so thank you. – Craig155 Sep 01 '15 at 21:09