0

I would like to ask how to find 5 most often strings and number of their occurence in this case.

I have a cycle in bash script, in this cycle there is one variable which is changed every iteration to some string.

I need to be able to save to some variable(s) (probably array?) 5 most often strings together with the number of their occurence (second array?) to be able to work with this later in the script.

This is my trying code..

last=0 #index of the last string in the array

for i in ...
do

string=... #this is changed each iteration

placed=0 #checks whether the string has already benn placed
index=0

    while [ "$placed" -ne 1 ] #searches if the string is not places through the array ARRAY
    do
        if [ "$last" -eq "$index" ] ; then # this should place the string at the end if it is not in the arraz already
            ARRAY[index]="$string"
            OCCURENCE[index]=1
            (( index++ ))
            (( last++ ))
            break
        fi

        if [ "$string" == "$ARRAY[$index]" ] ; then 
                # here i  have another array with the occurences and increment the same index there
                (( OCCURENCE[index]++ ))
                placed=1
        fi

        (( index++ ))
    done

done

If the main for loop will have 10 iterations, and there will be strings

"hello 1"
"hello 2"
"hello 3"
"hello 1"
"hello 1"
"hello 2"
"hello 4"
"hello 5"
"hello 6"
"hello 2"

I would like to have array with strings

"hello 1"
"hello 2"
"hello 3"
"hello 4"
"hello 5"
"hello 6"

And occurance array

3
3
1
1
1
1
Ators
  • 3
  • 6

2 Answers2

1

How about simply:

#!/usr/bin/env bash

declare -A array

while read -r line
do
    (( array["$line"]++ ))
done<input_file

for i in "${!array[@]}"
do
    echo "$i has count of ${array[$i]}"
done
grail
  • 914
  • 6
  • 14
  • Doesn't this just calculate number of lines of the strings? – Ators Mar 11 '17 at 16:25
  • Not sure what you mean? The indexes of the array are the lines from the file and the numbers show are the counts of each lines occurrence. If this is not what you want then I have misunderstood your question and example?? – grail Mar 11 '17 at 16:52
  • Oh that's really smart! You still need to order them afterwards though... – jraynal Mar 11 '17 at 17:13
0

I think what you want is solved in this question.

The solution is to use sort and uniq to get your desired output.

declare -a lines;
declare -a count;

while read -r line
do
    lines+=(${line});
done < <(echo $list | sort | uniq | tr '\n' ' ') #prints the sorted lines

while read -r line
do
    count+=(${line});
done < <(echo $list | sort | uniq --count | tr '\n' ' ') #prints the corresponding number of occurences

for ((i=0; i<${#lines[@]}; i=$i+1));
do
   echo "${lines[i]} ${count[i]}"
done | sort -k2 -n -r | head -n 5; # should sort along the second column, and cut the 5 first elements.
Community
  • 1
  • 1
jraynal
  • 507
  • 3
  • 10
  • In what form can i get the strings values to the list, except of temporaly file? And how can i not print but just save for the script? Thank you – Ators Mar 11 '17 at 15:43
  • You can put that in a function and pipe it in another command. – jraynal Mar 11 '17 at 15:49
  • I am not sure what do you mean, I am a newbie to bash, should i put your code to the function or my main loop, because the loope iterates through and the string variable changes there, so after it changes, should i call a function with your code? – Ators Mar 11 '17 at 16:20