2

I am working on a course project ! And homework text is as below :

Write a shell script that takes a word and a number as arguments. Then it checks all files in your current directory, and finds out the files which include the given word at least the given number of times.

Sample output should be :

$myprog3.sh write 2
The file "./file-comp.sh" contains the word "write" 3 times.
The file "./homework.log" contains the word "write" 11 times.

I wrote some of the code but im having problem while reading the filenames into an array.

count=`find . -type f -exec grep -H $word {} \; | wc -l`
read -a filearray <<< `find . -type f -exec grep -l "$word" {} \;`
read -a numarray <<< `find . -type f -exec grep -c "$word" {} \;`
size=${#filearray[@]}
echo "Array size is "$size""
for x in `seq 0 $size`
do
echo $x
echo "${filearray[x]}"
done

Output seems like this :

Array size is 5
0
./UntitledDocument.tex~
1
./Untitled
2
Document.tex
3
./wordcounter.sh
4
./wordcounter.sh~
5

For ex: it should seem like Untitled Document.tex instead of

Untitled

Document.tex

How can i fix it?

And also for the full question could you please offer me a solution? Thanks in advance..

blodrayne
  • 1,145
  • 2
  • 10
  • 16
  • You mean this :size=${#filearray[@]} echo "Array size is "$size"" for x in "${filearray[@]}";do echo "$x" done But it is still same :( – blodrayne Oct 17 '13 at 12:10

3 Answers3

3

Spaces in the filenames are causing it to be split while assigning to the array. The simplest way would be to define IFS to something that wouldn't contain a space. Instead of saying

read -a filearray <<< `find . -type f -exec grep -l "$word" {} \;`

say:

IFS=$'\n' read -a filearray <<< `find . -type f -exec grep -l "$word" {} \;`
devnull
  • 118,548
  • 33
  • 236
  • 227
  • What's wrong with this to attract downvotes after more than 4 months of the post? Could someone leave a note instead suggesting what is wrong? – devnull Mar 29 '14 at 14:40
1

As grep -Hc will output

file:number_of_ocurrencies

You can do it as follows:

declare -A arr
while IFS=: read file count
do
    arr["$file"]=$count         #### "$file" to allow spaces on the names
done < <(find . -type f -exec grep -Hc "$word" {} \;)

So that you have an associative array

([file1]=>number_of_ocurrencies_file1 [file2]=>number_of_ocurrencies_file2)

And then you can loop as follows:

for key in "${!arr[@]}"; do    ### double quotes to accept keys with spaces
    echo "$key = ${arr[$key]}"
done

Partly based on Bash script “find” output to array.

Community
  • 1
  • 1
fedorqui
  • 275,237
  • 103
  • 548
  • 598
0

You're running the same command three separate times! And, the find command can take a long time to run.

I would take a look at your loop and see if you can do all of your steps in that single loop:

file_count=0
find . -type f -print0 | while read -d $'\0' file
do
    ((file_count+=1))  #Count the number of files processed
    here be dragons...
    echo "The '$file' file contains '$word' $word_count times"
done

The -print0 argument separates out the file names with the NUL character (One of the two characters that can't be contained in a file name. For extra credit, can you name the other?) You pipe this into a while read file to read the file name. The -d$'\0' tells the read to break up the words on the null character.

Not only does this take care of spaces in file names, but also tabs, double spaces, character returns, new lines, and almost anything else that can be tossed into the mix. You're guaranteed that you are reading one and only one file name no matter how funky that file name is.

Piping output of a command into a while read statement is a fairly efficient operation. It can go in parallel. That is, while the output of the command is being piped, the while loop is executing. Take a good look at this structure of this loop because you will be seeing it over and over again in your shell scripts.

The ((...)) is a mathematical operation.

The here be dragons... is where you fill in the logic to get the information you need. After all, it is a homework assignment. However, it looks like you have a good handle on shell scripting.


If you have to have these two arrays, I would pipe the output of the find into an array, then use that array to put your information into the numarray and filearray. It's not efficient, but at least you aren't running the find command three separate times.

David W.
  • 105,218
  • 39
  • 216
  • 337