1

I have an x y z matrix of the format:

1 1 0.02
1 2 0.10
1 4 0.22
2 1 0.70
2 2 0.22
3 2 0.44
3 3 0.42

...and so on. I'm interested in summing all of the z values (column 3) for a particular x value (column 1) and printing the output on separate lines (with the x value as a prefix), such that the output for the previous example would appear as:

1 0.34
2 0.92
3 0.86

I have a strong feeling that awk is the right tool for the job, but knowledge of awk is really lacking and I'd really appreciate any help that anyone can offer.

Thanks in advance.

1 Answers1

3

I agree that awk is a good tool for this job — this is pretty much exactly the sort of task it was designed for.

awk '{ sum[$1] += $3 } END { for (i in sum) print i, sum[i] }' data

For the given data, I got:

2 0.92
3 0.86
1 0.34

Clearly, you could pipe the output to sort -n and get the results in sorted order after all.

To get that in sorted order with awk, you have to go outside the realm of POSIX awk and use the GNU awk extension function asorti:

gawk '{ sum[$1] += $3 }
      END { n = asorti(sum, map); for (i = 1; i <= n; i++) print map[i], sum[map[i]] }' data  

Output:

1 0.34
2 0.92
3 0.86
Community
  • 1
  • 1
Jonathan Leffler
  • 730,956
  • 141
  • 904
  • 1,278