18

Here's my array (gawk script) :

myArray["peter"] = 32
myArray["bob"] = 5
myArray["john"] = 463
myArray["jack"] = 11

After sort, I need the following result :

bob    5
jack   11
peter  32
john   463

When i use "asort", indices are lost. How to sort by array value without losing indices ? (I need ordered indices based on their values)

(I need to obtain this result with awk/gawk only, not shell script, perl, etc)

If my post isn't clear enough, here is an other post explaining the same issue : http://www.experts-exchange.com/Programming/Languages/Scripting/Shell/Q_26626841.html )

Thanks in advance

Update :

Thanks to you both, but i need to sort by values, not indices (i want ordered indices according to their values).

In other terms, i need this result :

bob    5
jack   11
peter  32
john   463

not :

bob 5
jack 11
john 463
peter 32

(I agree, my example is confusing, the chosen values are pretty bad)

From the code of Catcall, I wrote a quick implementation that works, but it's rather ugly (I concatenate keys & values before sort and split during comparison). Here's what it looks like :

function qsort(A, left, right,   i, last) {
  if (left >= right)
    return
  swap(A, left, left+int((right-left+1)*rand()))
  last = left
  for (i = left+1; i <= right; i++)
    if (getPart(A[i], "value") < getPart(A[left], "value"))
      swap(A, ++last, i)
  swap(A, left, last)
  qsort(A, left, last-1)
  qsort(A, last+1, right)
}

function swap(A, i, j,   t) {
  t = A[i]; A[i] = A[j]; A[j] = t
}

function getPart(str, part) {
  if (part == "key")
    return substr(str, 1, index(str, "#")-1)
  if (part == "value")
    return substr(str, index(str, "#")+1, length(str))+0
  return
}

BEGIN {  }
      {  }
END {

  myArray["peter"] = 32
  myArray["bob"] = 5
  myArray["john"] = 463
  myArray["jack"] = 11

  for (key in myArray)
    sortvalues[j++] = key "#" myArray[key]

  qsort(sortvalues, 0, length(myArray));

  for (i = 1; i <= length(myArray); i++)
    print getPart(sortvalues[i], "key"), getPart(sortvalues[i], "value")
}

Of course I'm interested if you have something more clean...

Thanks for your time

Phil
  • 478
  • 1
  • 4
  • 12

6 Answers6

30

Edit:

Sort by values

Oh! To sort the values, it's a bit of a kludge, but you can create a temporary array using a concatenation of the values and the indices of the original array as indices in the new array. Then you can asorti() the temporary array and split the concatenated values back into indices and values. If you can't follow that convoluted description, the code is much easier to understand. It's also very short.

# right justify the integers into space-padded strings and cat the index
# to create the new index
for (i in myArray) tmpidx[sprintf("%12s", myArray[i]),i] = i
num = asorti(tmpidx)
j = 0
for (i=1; i<=num; i++) {
    split(tmpidx[i], tmp, SUBSEP)
    indices[++j] = tmp[2]  # tmp[2] is the name
}
for (i=1; i<=num; i++) print indices[i], myArray[indices[i]]

Edit 2:

If you have GAWK 4, you can traverse the array by order of values without performing an explicit sort:

#!/usr/bin/awk -f
BEGIN {
    myArray["peter"] = 32
    myArray["bob"] = 5
    myArray["john"] = 463
    myArray["jack"] = 11

    PROCINFO["sorted_in"] = "@val_num_asc"

    for (i in myArray) {
        {print i, myArray[i]}}
    }

 }

There are settings for traversing by index or value, ascending or descending and other options. You can also specify a custom function.

Previous answer:

Sort by indices

If you have an AWK, such as gawk 3.1.2 or greater, which supports asorti():

#!/usr/bin/awk -f
BEGIN {
    myArray["peter"] = 32
    myArray["bob"] = 5
    myArray["john"] = 463
    myArray["jack"] = 11

    num = asorti(myArray, indices)
    for (i=1; i<=num; i++) print indices[i], myArray[indices[i]]
}

If you don't have asorti():

#!/usr/bin/awk -f
BEGIN {
    myArray["peter"] = 32
    myArray["bob"] = 5
    myArray["john"] = 463
    myArray["jack"] = 11

    for (i in myArray) indices[++j] = i
    num = asort(indices)
    for (i=1; i<=num; i++) print i, indices[i], myArray[indices[i]]
}
Dennis Williamson
  • 346,391
  • 90
  • 374
  • 439
  • @Phil: Please see my edited answer. I'm sorry I didn't read your question carefully. – Dennis Williamson Mar 18 '11 at 03:08
  • 2
    for your updated answer, I think the convention is to use awk's pseudo multidimensional arrays `a[myArray[i], i]` and then `split()` on `SUBSEP` as seen in one of my answers here --> http://stackoverflow.com/questions/4443583/bash-script-that-analyzes-report-files/4445671#4445671 Still a kludge to be sure but a little less so IMHO – SiegeX Mar 18 '11 at 05:36
  • 2
    @SiegeX: In this case, since it's a numeric sort, I needed to use `sprintf()` anyway, but `SUBSEP` would be a better character to use than space. I will update my answer. – Dennis Williamson Mar 18 '11 at 07:43
  • excellent, I had not thought to use right aligned sprintf to sort numbers directly with asorti! this saves me using a custom sort method. Thank you all – Phil Mar 19 '11 at 14:08
  • Is the numeric sort behavior for right-justified numeric strings documented somewhere? – haridsv Nov 30 '15 at 06:26
  • @haridsv: No, not as such. Sorting in GAWK is lexical, not numeric, and by right justifying the numbers it forces the sort to behave as if it were numeric. Note that GAWK 4 now has a facility to control array traversal order `PROCINFO["sorted_in"] ="@ind_num_asc" ` for example, without needing to explicitly sort the array. – Dennis Williamson Nov 30 '15 at 17:27
  • @haridsv : if the "numeric" indices are already right aligned, then it's probably slightly faster for `gawk` if u use `PROCINFO["sorted_in"] ="@ind_str_asc"` instead, since it skips the step of having to first convert it back to numeric before comparing – RARE Kpop Manifesto Dec 30 '22 at 18:20
6

Use the Unix sort command with the pipe, keeps Awk code simple and follow Unix philosophy
Create a input file with values seperated by comma
peter,32
jack,11
john,463
bob,5

Create a sort.awk file with the code

BEGIN { FS=","; }
{
    myArray[$1]=$2;
}
END {
    for (name in myArray)
        printf ("%s,%d\n", name, myArray[name]) | "sort -t, -k2 -n"
}

Run the program, should give you the output
$ awk -f sort.awk data
bob,5
jack,11
peter,32
john,463

Sal Kadam
  • 77
  • 1
  • 2
5
PROCINFO["sorted_in"] = "@val_num_desc";

Before iterating an array, use the above statement. But, it works in awk version 4.0.1. It does not work in awk version 3.1.7.

I am not sure in which intermediate version, it got introduced.

K Adithyan
  • 386
  • 4
  • 12
  • This could be described a bit better, but GAWK lets you tell it how you want arrays to be sorted when you say `for (a in arr) { ... }`. https://www.gnu.org/software/gawk/manual/html_node/Controlling-Array-Traversal.html#Controlling-Array-Traversal – Cameron Kerr Apr 14 '15 at 23:59
  • Just remember to call `gawk` and not just `awk`. This is the biggest for me to use gawk; I can afford not to care about non-gawk in our environment. – Cameron Kerr Apr 15 '15 at 00:06
1

And the simple answer...

function sort_by_myArray(i1, v1, i2, v2) {
    return myArray[i2] < myArray[i1];
}

BEGIN {
    myArray["peter"] = 32;
    myArray["bob"] = 5;
    myArray["john"] = 463;
    myArray["jack"] = 11;
    len = length(myArray);

    asorti(myArray, k, "sort_by_myArray");

    # Print result.
    for(n = 1; n <= len; ++n) {
            print k[n], myArray[k[n]]
    }
}
Carlo Wood
  • 5,648
  • 2
  • 35
  • 47
0

Use asorti:

#!/usr/bin/env -S gawk -f
{
    score[$1] = $0;
    array[sprintf("%3s",$2) $1] = $1;
}

END {
    asorti(array, b)
    for(i in b)
    {
        name = array[b[i]]
        print score[name]
    }
}
Edward
  • 333
  • 2
  • 12
0

The following function works in Gawk 3.1.7 and doesn't resort to any of the workarounds described above (no external sort command, no array subscripts, etc.) It's just a basic implementation of the insertion sort algorithm adapted for associative arrays.

You pass it an associative array to sort on the values and an empty array to populate with the corresponding keys.

myArray["peter"] = 32;
myArray["bob"] = 5;
myArray["john"] = 463;
myArray["jack"] = 11;

len = resort( myArray, result );

for( i = 1; i <= len; i++ ) {
    key = result[ i ];
    print i ": " key " = " myArray[ key ];
}

Here is the implementation:

function resort( data, list,
        key, len, i, j, v )
{
        len = 0;
        for( key in data ) {
                list[ ++len ] = key;
        }

        # insertion sort algorithm adapted for
        # one-based associative arrays in gawk
        for( i = 2; i <= len; i++ )
        {
                v = list[ i ];
                for( j = i - 1; j >= 1; j-- )
                {
                        if( data[ list[ j ] ] <= data[ v ] )
                                break;
                        list[ j + 1 ] = list[ j ];
                }
                list[ j + 1 ] = v;
        }

        return len;
}

You can "reverse sort" by simply iterating the resulting array in descending order.

Leslie Krause
  • 387
  • 3
  • 4