1

This code is to read a directory of text file and match it with the input.txt. I got the word from input.txt working, but I don't know how to extract each word from a text file and compare to it. The file is in paragraph form so I can't look for similar character and such. Is there a way to read every word one by one at a time and compare?

#!/bin/bash

findkeyword () {
    file="$1"   
    keyword="$2"    
    value="$3"

    count=0
    while read line
    do

#problem right here


    set -- $line
    a=$(expr length "$file")
        for i in '$line'; do
                    if [ "$i" = "$keyword" ]; then
                count=`expr $count + 1`;
            fi
            done

    done <$file

    echo "Profile: " $file
    scorefile $value $count
}

scorefile () {
    value="$1"
    count="$2"

    echo "Score: "  $((value * count)) 

}


while read line
        do
        set -- $line
        keyword=$1
            value=$2

        echo "key: " $keyword
        echo "value: " $value

        for xx in `ls submissions/*`
            do
                     filename=$xx
                     findkeyword $filename $keyword $value
            done
        done <input.txt
imm
  • 5,837
  • 1
  • 26
  • 32
user968623
  • 53
  • 1
  • 2
  • 6
  • You already do `set -- $line` which does precisely what you are asking for. You have the words in "$@" at this point. – tripleee Oct 11 '11 at 07:38
  • Also the quoting `'$line'` prevents expansion. If you take out the single quotes, your code should do what you want (although still not very elegantly). You do not seem to be using the value of `a` at all, and the `set -- $line` is not doing anything in the `findkeywords` function. – tripleee Oct 11 '11 at 07:48

1 Answers1

4

To count the occurences of a word in a file, just use grep -c (count):

for word in $(<input.txt); do echo -n $word " " ; grep -c $word $file; done  

For different files in a dir, never¹ ever use ls.

 for file in submissions/*
 do
      echo "$file"
      for word in $(<input.txt)
      do
          echo -n "$word " ; grep -c "$word" "$file"
      done
 done 

¹in very, very rare cases, it might be the best solution, but blanks, linefeeds and special characters in filenames will corrupt your commands.

user unknown
  • 35,537
  • 11
  • 75
  • 121
  • You should use `grep -w` or perhaps `grep -F -w` to count only exact word matches. This also fails to update the scores. – tripleee Oct 11 '11 at 07:46
  • Are you sure what the question is? What do you mean with `update the scores`? Is he searching for the sum of matches per keyword over all files? – user unknown Oct 12 '11 at 02:33
  • The problem description certainly implies to me that if "as" is a keyword, you should not include occurrences of e.g. "wash" in the count. Restricting matches to whole words prevents that. Depending on the input, you might also add `-o` to count multiple occurrences on the same line separately. Other than that, I like the clarity of this answer, although it doesn't implement all of the question's code, which includes a score occurrences times value. – tripleee Oct 12 '11 at 04:26