28

I have an associative array in awk that gets populated like this:

chr_count[$3]++

When I try to print my chr_counts, I use this:

for (i in chr_count) {
    print i,":",chr_count[i];
}

But not surprisingly, the order of i is not sorted in any way. Is there an easy way to iterate over the sorted keys of chr_count?

codeforester
  • 39,467
  • 16
  • 112
  • 140
lonestar21
  • 1,113
  • 2
  • 13
  • 23
  • 2
    See http://stackoverflow.com/a/5345056/69663 – if you have gawk 4, `PROCINFO["sorted_in"] = "@val_num_asc"` etc. are very simple to use. The manual shows a lot of different options if you want descending/ascending, by value/key, numerically/stringually, your own function etc: https://www.gnu.org/software/gawk/manual/html_node/Controlling-Scanning – unhammer May 12 '16 at 17:54

5 Answers5

37

Instead of asort, use asorti(source, destination) which sorts the indices into a new array and you won't have to copy the array.

Then you can use the destination array as pointers into the source array.

For your example, you would use it like this:

n=asorti(chr_count, sorted)
for (i=1; i<=n; i++) {
        print sorted[i] " : " chr_count[sorted[i]]
}
Community
  • 1
  • 1
Dennis Williamson
  • 346,391
  • 90
  • 374
  • 439
  • 1
    Wow, totally forgot about that despite reading right past it in the docs. This is definitely the better answer. – Cascabel Mar 16 '10 at 22:00
  • 2
    `asorti` doesn't work with nawk-20121220-2.fc20.x86_64. – Cristian Ciupitu Aug 01 '14 at 14:11
  • 1
    @CristianCiupitu: Sorry `asorti` is GAWK-specific. In fact, I don't think `nawk` has any built-in sort functions. – Dennis Williamson Aug 01 '14 at 16:04
  • 2
    GNU Awk's [documentation](https://www.gnu.org/software/gawk/manual/html_node/String-Functions.html) mentions that indeed: "asort() and asorti() are gawk extensions; they are not available in compatibility mode (see [Options](https://www.gnu.org/software/gawk/manual/html_node/Options.html))". – Cristian Ciupitu Aug 01 '14 at 16:09
16

you can use the sort command. e.g.

for ( i in data )
 print i ":", data[i]  | "sort"
krishna murti
  • 1,061
  • 8
  • 9
14

I recently came across this issue and found that with gawk I could set the value of PROCINFO["sorted_in"] to control iteration order. I found a list of valid values for this by searching for PROCINFO online and landed on this GNU Awk User's Guide page: https://www.gnu.org/software/gawk/manual/html_node/Controlling-Scanning.html

This lists options of the form @{ind|val}_{num|type|str}_{asc|desc} with:

  • ind sorting by key (index) and val sorting by value.
  • num sorting numerically, str by string and type by assigned type.
  • asc for ascending order and desc for descending order.

I simply used:

PROCINFO["sorted_in"] = "@val_num_desc"
for (i in map) print i, map[i]

And the output was sorted in descending order of values.

lynxlynxlynx
  • 1,371
  • 17
  • 26
Joe F
  • 642
  • 4
  • 12
9

Note that asort() and asorti() are specific to gawk, and are unknown to awk. For plain awk, you can roll your own sort() or get one from elsewhere.

Endre
  • 690
  • 8
  • 15
mr. fixit
  • 1,404
  • 11
  • 19
7

This is taken directly from the documentation:

 populate the array data
 # copy indices
 j = 1
 for (i in data) {
     ind[j] = i    # index value becomes element value
     j++
 }
 n = asort(ind)    # index values are now sorted
 for (i = 1; i <= n; i++) {
     do something with ind[i]           Work with sorted indices directly
     ...
     do something with data[ind[i]]     Access original array via sorted indices
 }
Cascabel
  • 479,068
  • 72
  • 370
  • 318
  • Watch out, this solution is flawed as this ends up losing keys that have the same values in the original array. The accepted solution from this other thread has an idea on how to workaround that: http://stackoverflow.com/a/5345056/95750 – haridsv Nov 30 '15 at 06:12
  • 2
    @haridsv No, I don't think so. This question is about sorting by the keys, not the values, and there can't be two values for the same key, so there's no issue here. The other question you point to is about sorting by values (which indeed may not all be distinct), so if you tried to use this code for that, it'd be a problem. But this isn't flawed if you use it for what it's written for. – Cascabel Nov 30 '15 at 06:26
  • 1
    Apologies.. I misread the indexing code as "flipping" key/values, but after rereading it, I noticed that you are using a constantly increasing number as index, not the original value. Thank you for getting back and clarifying it. – haridsv Nov 30 '15 at 06:42