26

I realize that awk has associative arrays, but I wonder if there is an awk equivalent to this:

http://php.net/manual/en/function.array-push.php

The obvious workaround is to just say:

array[$new_element] = $new_element

However, this seems less readable and more hackish than it needs to be.

shellter
  • 36,525
  • 7
  • 83
  • 90
merlin2011
  • 71,677
  • 44
  • 195
  • 329
  • 2
    I'd call that elegant and minimalist, not hackish! ;-). You can always write your own functions to manage arrays, but there is nothing built into the language for that. Good luck. – shellter May 25 '12 at 18:00
  • Storing an element at `length(A)+1` as proposed in other solutions will result in `attempt to use scalar \`A' as an array` from gawk, and would in turn require [more workarounds](https://groups.google.com/g/comp.lang.awk/c/jrRiumpwr20/m/9l_boqItAwAJ). So, in my mind, your "hackish" solution is the most portable one. – TheDudeAbides Oct 16 '22 at 19:29

3 Answers3

20

I don't think an array length is immediately available in awk (at least not in the versions I fiddle around with). But you could simply maintain the length and then do something like this:

array[arraylen++] = $0;

And then access the elements it via the same integer values:

for ( i = 0; i < arraylen; i++ )
   print array[i];
Mark Wilkins
  • 40,729
  • 5
  • 57
  • 110
  • 3
    +1 - The `length()` function in GAWK will return the number of elements in an array, but since arrays are sparse, the length isn't necessarily the last element. – Dennis Williamson May 25 '12 at 18:10
  • 4
    Just for historical reference, the `length(arrayname)` notation is not exclusive to GAWK. It was [added](https://github.com/onetrueawk/awk/blob/master/FIXES#L322) to the One True Awk in 2002. It may be that this functionality hit gawk [three years later](http://code.metager.de/source/xref/gnu/gawk/ChangeLog.0#3453). – ghoti Jan 05 '16 at 03:46
  • @ghoti - The link to github (for onetrueawk) acknowledges Arnold Robbins – Happy Green Kid Naps Jun 14 '18 at 16:56
  • @HappyGreenKidNaps - yes, and Arnold Robbins was a member of the POSIX 1003.2 balloting group, so in addition to is involvement with gawk, he helped define standards for awk. If you find yourself in Israel, do buy him a beer. :) BTW, further digging reveals that Gawk's [feature history](https://www.gnu.org/software/gawk/manual/html_node/Feature-History.html) suggests that `length()` worked on arrays as of version 3.1, which was [released in June 2001](http://git.savannah.gnu.org/cgit/gawk.git/tree/ChangeLog.0#n7232). Not sure where I got the 2005 idea, that link is broken. – ghoti Jun 14 '18 at 17:42
  • things unclear: do you expect the items in the array to maintain or end up with a particular ordering? do you have to maintain or eliminate duplicates? – tomc Oct 17 '22 at 02:32
12

In gawk you can find the length of an array with length(var) so it's not very hard to cook up your own function.

function push(A,B) { A[length(A)+1] = B }

Notice this discussion, though -- all the places I can access right now have gawk 3.1.5 so I cannot properly test my function, duh. But here is an approximation.

vnix$ gawk '# BEGIN: make sure arr is an array
>   BEGIN { delete arr[0] }
>   { print "=" length(arr); arr[length(arr)+1] = $1;
>     print length(arr), arr[length(arr)] }
>   END { print "---";
>     for (i=1; i<=length(arr); ++i) print i, arr[i] }' <<HERE
> fnord foo
> ick bar
> baz quux
> HERE
=0
1 fnord
=1
2 ick
=2
3 baz
---
1 fnord
2 ick
3 baz
Community
  • 1
  • 1
tripleee
  • 175,061
  • 34
  • 275
  • 318
  • 4
    The `A[length(A)+1]` snippet is not guaranteed to avoid collisions. It works in cases, like your example, where you only add things to the array in a predictable order. If, however, you were to delete array elements, you'd creating gaps which reduce `length()` while leaving the highest number in place. – ghoti Aug 29 '16 at 05:56
  • I would like to do this, but, od something like log rotate, have a mzximum length, over time. I want to track my cpu temperature, but keep only ten or twenty of the previous recordings. – nyxee Mar 19 '21 at 14:48
  • @nyxee You can use a fixed-length array as a circular buffer by incrementing the index to the last item you have updated; then the previous indices backwards are increasingly old; continue from the end of the array when you wrap past the beginning, and vice versa when incrementing the index variable. It should not be hard to find examples of this. `rrdtool` does something similar with persistent storage in a disk file if you need that. – tripleee Mar 20 '21 at 10:36
  • The same but more handy in the case it is one-time operation: array_name[length(array_name)] = "item" – elixon Dec 20 '22 at 14:23
3

As others have said, awk provides no functionality like this out of the box. Your "hackish" workaround may work for some datasets, but not others. Consider that you might add the same array value twice, and want it represented twice within the array.

$ echo 3 | awk 'BEGIN{ a[1]=5; a[2]=12; a[3]=2 }
>   { a[$1] = $1 }
>   END {print length(a) " - " a[3]}'
3 - 3

The best solution may be informed by the data are in the array, but here are some thoughts.

First off, if you are certain that your index will always be numeric, will always start at 1, and that you will never delete array elements, then triplee's suggestion of A[length(A)+1]="value" may work for you. But if you do delete an element, then your next write may overwrite your last element.

If your index does not matter, and you're not worried about wasting space with long keys, you could use a random number that's long enough to reduce the likelihood of collisions. A quick & dirty option might be:

srand()
a[rand() rand() rand()]="value"

Remember to use srand() for better randomization, and don't trust rand() to produce actual random numbers. This is a less than perfect solution in a number of ways, but it has the advantage of being a single line of code.

If your keys are numeric but possibly sparse, as in the example that would break tripleee's solution, you can add a small search to your push function:

function push (a, v,     n) {
  n=length(a)+1
  while (n in a) n++
  a[n]=v
}

The while loop insures that you'll assign an unused index. This function is also compatible with arrays that use non-numeric indices -- it assigns keys that are numeric, but it doesn't care what's already there.

Note that awk does not guarantee the order of elements within an array, so the idea that you will "push an item onto the end of the array" is wrong. You'll add this element to the array, but there's no guarantee it's appear last when you step through with a for loop.

$ cat a
#!/usr/bin/awk -f

function push (a, v,     n) {
  n=length(a)+1
  while (n in a) n++
  a[n]=v
}

{
  push(a, $0)
}

END {
  print "length=" length(a)
  for(i in a) print i " - " a[i]
}

$ printf '3\nfour\ncinq\n' | ./a
length=3
2 - four
3 - cinq
1 - 3
ghoti
  • 45,319
  • 8
  • 65
  • 104