2

In a shell script using bash i would like to find the most frequent occurrence of a number within an array and store the result in variable $result. The array could have any number of values. If multiple results are returned then I would like to select the lowest number.

I understand bash may not be the best tool for this and I am open to suggestions using tools available from the command line within my script on a Mac OS X system.

Example:

array=(03 03 03 04 04 04 04)
3 occurrences of 03
4 occurrences of 04
Should return 04 into a variable named $result.

Another example:

array=(03 03 03 03 04 04 04 04)
4 occurrences of 03
4 occurrences of 04
Select lowest number which is 03
Should return 03 into a variable named $result.

Thank you for your help.

codeforester
  • 39,467
  • 16
  • 112
  • 140
KeelDev
  • 23
  • 5
  • I'm not sure if it would have helped for this particular question or not. I'm not a bash guru or anything. But in any case, you can get a newer version of bash very easily through [Homebrew](https://brew.sh/). The latest version I believe is 4.4. Many additional and helpful features are available in the newer versions. – I0_ol Apr 17 '17 at 03:45

3 Answers3

4

There is an ambiguity in your question which needs to be resolved: you say that the array is an array of numbers but the example presents them with leading zeros, which will lead to some surprises if you treat the strings as numbers (they will be interpreted in octal).

Other than that, the solution is relatively simple: use sort and uniq to count the number of instances of each value, sort the result by count, and then extract the first value. To meet the requirements of sort, we start by writing the array one element per line using printf:

printf '%s\n' "${arr[@]}" | sort | uniq -c |
sort -k1,1nr -k2 | awk '{print $2; exit}'

Both invocations to sort sort the original data as strings. If you really want to sort them as numbers you could use:

printf '%d\n' "${arr[@]}" | sort -n | uniq -c |
sort -k1,1nr -k2n | awk '{print $2; exit}'

although that will normalize all numbers to a canonical form (so that 03 will become 3).

rici
  • 234,347
  • 28
  • 237
  • 341
  • this didn't work for -- > arr=(03 03 03 04 04 04 04) – KeelDev Apr 16 '17 at 19:02
  • @keelDev: There are (well, were) three suggestions in that answer; you didn't specify which one doesn't work, but on reflection it must be the last one since the printf is wrong (and also, more seriously, the use of uniq). So I deleted it; the other two work fine within their respective constraints (either treat the data as strings or normalize the numbers). – rici Apr 16 '17 at 19:33
2

Here's an awk-based solution that avoids bash-associative arrays:

#!/bin/bash
get_result(){
awk '
  { 
      n=++hsh[$1]
      if(n>max_occ){
         max_occ=n
         what=$1
      }else if(n==max_occ){
         if(what>$1) 
             what=$1
      }
  } 
  END { print what }
'
}

array=(03 03 03 04 04 04 04)
result=$(printf "%s\n" "${array[@]}" |  get_result)
echo $result

array=(03 03 03 03 04 04 04 04)
result=$(printf "%s\n" "${array[@]}" |  get_result)
echo $result

The results are 03 and 04 as in your example.

Petr Skocik
  • 58,047
  • 6
  • 95
  • 142
0

You can use an associative array to keep track of the frequency of your array elements:

#!/bin/bash

arr=(1 2 3 4 4 44 4 4 5 5)
declare -A hash
max_times=0
for i in "${arr[@]}"; do
  ((hash[$i]++))
  h=${hash[$i]}
  if [[ $h > $max_times ]]; then
    max=$i
    max_times=$h
  fi
done

echo max=$max, max_times=$max_times

Output:

max=4, max_times=4

If we can't use associative arrays, then we can make use of external tools:

array=(1 3 3 333 5 66 5 33 66 66 33 22 11)
printf '%d\n' "${array[@]}" | sort -n | uniq -c | sort -n | tail -1

Output:

  3 66
codeforester
  • 39,467
  • 16
  • 112
  • 140
  • thank you for the reply, unfortunately I only have bash 3.2 available and would prefer not to mess with upgrading bash on Mac OS X. To my knowledge the declare function is unavailable in bash versions <4.0 – KeelDev Apr 16 '17 at 18:26
  • @KeelDev `declare` itself is older than Bash 4.0, but `declare -A` for associative arrays is 4.0 or newer. – Benjamin W. Apr 16 '17 at 19:34