AWK: Pattern match multiline data with variable line number

Question

I am trying to write a script which will analyze data from a pipe. The problem is, a single element is described in a variable number of lines. Look at the example data set:

3 14 -30.48 17.23
4  1 -18.01 12.69
4  3 -11.01  2.69
8 12 -21.14 -8.76
8 14 -18.01 -5.69
8 12 -35.14 -1.76
9  2  -1.01 22.69
10 1 -88.88 17.28
10 1   -.88 14.28
10 1   5.88  1.28
10 1  -8.88 -7.28

In this case, the first entry is what defines the event to which the following data belongs. In the case of event number 8, we have data in 3 lines. To simplify the rather complex problem that I am trying to solve, let us imagine, that I want to calculate the following expression:

sum_i($2 * ($3 + $4))

Where i is taken over all lines belonging to a given element. The output I want to produce would then look like:

3=-185.5   [14(-30.48+17.23) ]
4=-30.28   [1(-18.01+12.69) + 3(-11.01+2.69)]
8=-1106.4  [...]

I thus need a script which reads all the lines that have the same index entry.

I am an AWK newbie and I've started learning the language a couple of days ago. I am now uncertain whether I will be able to achieve what I want. Therefore:

Is this doable with AWK?
If not, whith what? SED?
If yes, how? I would be grateful if one provided a link describing how this can be implemented.

Finally, I know that there is a similar question: Can awk patterns match multiple lines?, however, I do not have a constant pattern which separates my data.

Thanks!

It is perfectly doable in awk and do not even try to do it in sed : ) What is the desired output? — fedorqui, Dec 23 '15 at 13:19
For example the sum I have quoted and the index number (the first column) — Sasha, Dec 23 '15 at 13:31

bian · Answer 1 · 2015-12-23T13:58:04.697

3

awk 'id!=$1{if(id){print id"="sum;sum=0};id=$1}{sum+=$2*($3+$4)} END{print id"="sum}' file
3=-185.5
4=-30.28
8=-1133.4
9=43.36
10=-67.2

edited Dec 23 '15 at 13:58

answered Dec 23 '15 at 13:49

bian

1,456
8
7

You sure "id" can never be zero? What other numbers is `id` guaranteed not to be? Test `if (id!="")` or even clearer `if (NR>1)`, not `if(id)`, and move `id=$1` out of the action block that's only executed when `id!=$1` and into the `sum+=...` block. – Ed Morton Dec 23 '15 at 16:10
1

Thanks @EdMorton. `awk 'id!=$1&&NR>1{print id"="sum;sum=0}{id=$1;sum+=$2*($3+$4)} END{print id"="sum}'` – bian Dec 24 '15 at 01:13

F. Knorr · Accepted Answer · 2015-12-23T19:34:20.830

You could try this:

awk '{ar[$1]+=$2*($3+$4)}
      END{for (key in ar) 
              {print key"="ar[key]}}' inputFile

For each line input we do the desired calculation and sum the result in an array. $1 serves as the key of the array.
When the entire file is read, we print the results in the END{...}-block.

The output for the given sample input is:

4=-30.28
8=-1133.4
9=43.36
10=-67.2
3=-185.5

If sorting of the output is required, you might want to have a look at gawk's asorti function or Linux' sort-command (e.g. awk '{...} inputFile' | sort -n).

This solution does not require that the input is sorted.

Thanks! This is exactly what I wanted to know. – Sasha Dec 23 '15 at 22:12 — Sasha, Dec 23 '15 at 22:12

score 1 · Answer 3 · answered Dec 23 '15 at 14:19

1

yet another similar awk

$ awk -v OFS="=" 'NR==1{p=$1}
                  p!=$1{print p,s; s=0; p=$1}
                       {s+=$2*($3+$4)}
                    END{print p,s}' file

3=-185.5
4=-30.28
8=-1133.4
9=43.36
10=-67.2

ps. Your calculation for "8" seems off.

answered Dec 23 '15 at 14:19

karakfa

66,216
7
41
56

Yeah, thanks for noticing. It seems manual calcuation skills are fading away. – Sasha Dec 23 '15 at 22:08

AWK: Pattern match multiline data with variable line number

3 Answers3