0

I have below given code. I want to print each word and its number of occurrences without using external utils such as wc, awk, tr, etc.

I am able to count the total number of words, but here I have also one issue: in the output I am not getting total word count, the output is less than what it should be.

What should I do?

#!/bin/bash
#v=1

echo -n "ENTER FILE NAME: "
read file
IFS=$'\n'
cnew_line=`echo -e "\n"`
cspace=`echo  " "`

if [ $# -ne 0 ] 
then

echo "You didn't entered a filename as a parameter"
exit

elif [ $# -eq 0 ] 
then
filename="$file"

num_line=0
num_word=0
num_char=0

while read -n1  w
do
if [ "$w" = "$cnew_line" ]
then
(( num_line++ ))
elif [ "$w" = "$cspace" ]
then

(( num_word++ ))

else
(( num_char++ ))
fi
done < "$filename"


echo "Line Number = $num_line"
echo "Word Number = $num_word"
echo "Character Number =$num_char"

fi

    enter code here
agc
  • 7,973
  • 2
  • 29
  • 50
  • 1
    Doing this in pure Bash is an extremely inefficient and clunky use of tools. Can you explain why you want to do this in an environment which isn't a particularly good fit for the task? Also, your code lacks indentation and triggers a number of warnings on http://shellcheck.net/. You should get these things into shape before asking for help here. – tripleee Dec 19 '17 at 05:54
  • 1
    Assinging a newline to `IFS` with `IFS=$'\n'` and then doing it again the wrong way on the next line suggests you don't really understand your own code. Hint: `cnew_line` does not end up containing a newline. – tripleee Dec 19 '17 at 05:55

2 Answers2

1

You could use an associative array for counting the words, a bit like this:

$ cat foo.sh
#!/bin/bash                                                                     

declare -A words

while read line
do
    for word in $line
    do
        ((words[$word]++))
    done
done

for i in "${!words[@]}"
do
    echo "$i:" "${words[$i]}"
done

Testing it:

$ echo this is a test is this | bash foo.sh
is: 2
this: 2
a: 1
test: 1

This answer was constructed pretty much from these fine answers: this and this. Don't forget to upvote them.

James Brown
  • 36,089
  • 7
  • 43
  • 59
  • That code considers punctuation part of a word, and it can't handle apostrophes. Example: `echo "Seward's folly" | bash foo.sh` returns "*bad array subscript*". – agc Dec 19 '17 at 15:27
0

Two improved versions of James Brown's answer, (which considers punctuation part of a word, and breaks on groups of double and single quotes):

  1. Punctuation considered part of word:

    #!/bin/bash
    declare -A words
    
    while read line ; do
        for word in ${line} ; do
            ((words[${word@Q}]++))
    done ; done
    
    for i in ${!words[@]} ; do
        echo ${i}: ${words[$i]}
    done
    
  2. Punctuation not part of word, (like wc):

    #!/bin/bash
    declare -A words
    
    while read line ; do
        line="${line//[[:punct:]]}"
        for word in ${line} ;do 
            ((words[${word}]++))
    done ; done
    
    for i in ${!words[@]} ;do
        echo ${i}: ${words[$i]}
    done
    

Tested code, with tricky quoted text:

  • fortune -m "swear" | bash foo.sh

  • man bash | ./foo.sh | sort -gr -k2 | head

agc
  • 7,973
  • 2
  • 29
  • 50