1

I am trying to write a script which will analyze data from a pipe. The problem is, a single element is described in a variable number of lines. Look at the example data set:

3 14 -30.48 17.23
4  1 -18.01 12.69
4  3 -11.01  2.69
8 12 -21.14 -8.76
8 14 -18.01 -5.69
8 12 -35.14 -1.76
9  2  -1.01 22.69
10 1 -88.88 17.28
10 1   -.88 14.28
10 1   5.88  1.28
10 1  -8.88 -7.28

In this case, the first entry is what defines the event to which the following data belongs. In the case of event number 8, we have data in 3 lines. To simplify the rather complex problem that I am trying to solve, let us imagine, that I want to calculate the following expression:

sum_i($2 * ($3 + $4))

Where i is taken over all lines belonging to a given element. The output I want to produce would then look like:

3=-185.5   [14(-30.48+17.23) ]
4=-30.28   [1(-18.01+12.69) + 3(-11.01+2.69)]
8=-1106.4  [...]

I thus need a script which reads all the lines that have the same index entry.

I am an AWK newbie and I've started learning the language a couple of days ago. I am now uncertain whether I will be able to achieve what I want. Therefore:

  1. Is this doable with AWK?
  2. If not, whith what? SED?
  3. If yes, how? I would be grateful if one provided a link describing how this can be implemented.

Finally, I know that there is a similar question: Can awk patterns match multiple lines?, however, I do not have a constant pattern which separates my data.

Thanks!

Community
  • 1
  • 1
Sasha
  • 1,338
  • 2
  • 13
  • 22

3 Answers3

3
awk 'id!=$1{if(id){print id"="sum;sum=0};id=$1}{sum+=$2*($3+$4)} END{print id"="sum}' file
3=-185.5
4=-30.28
8=-1133.4
9=43.36
10=-67.2
bian
  • 1,456
  • 8
  • 7
  • You sure "id" can never be zero? What other numbers is `id` guaranteed not to be? Test `if (id!="")` or even clearer `if (NR>1)`, not `if(id)`, and move `id=$1` out of the action block that's only executed when `id!=$1` and into the `sum+=...` block. – Ed Morton Dec 23 '15 at 16:10
  • 1
    Thanks @EdMorton. `awk 'id!=$1&&NR>1{print id"="sum;sum=0}{id=$1;sum+=$2*($3+$4)} END{print id"="sum}'` – bian Dec 24 '15 at 01:13
3

You could try this:

awk '{ar[$1]+=$2*($3+$4)}
      END{for (key in ar) 
              {print key"="ar[key]}}' inputFile

For each line input we do the desired calculation and sum the result in an array. $1 serves as the key of the array.
When the entire file is read, we print the results in the END{...}-block.

The output for the given sample input is:

4=-30.28
8=-1133.4
9=43.36
10=-67.2
3=-185.5

If sorting of the output is required, you might want to have a look at gawk's asorti function or Linux' sort-command (e.g. awk '{...} inputFile' | sort -n).

This solution does not require that the input is sorted.

F. Knorr
  • 3,045
  • 15
  • 22
1

yet another similar awk

$ awk -v OFS="=" 'NR==1{p=$1}
                  p!=$1{print p,s; s=0; p=$1}
                       {s+=$2*($3+$4)}
                    END{print p,s}' file

3=-185.5
4=-30.28
8=-1133.4
9=43.36
10=-67.2

ps. Your calculation for "8" seems off.

karakfa
  • 66,216
  • 7
  • 41
  • 56