shell script to find a word in a list of files, all of them given as parameters

Question

I need a simple shell program which has to do something like this:

script.sh word_to_find file1 file2 file3 .... fileN

which will display

word_to_find 3 - if word_to_find appears in 3 files

or

word_to_find 5 - if word_to_find appears in 5 files

This is what I've tried

#!/bin/bash

count=0
for i in $@; do
  if [ grep '$1' $i ];then
     ((count++))
  fi
done
echo "$1 $count"

But this message appears:

syntax error: "then" unexpected (expecting "done").

Before this the error was

[: grep: unexpected operator.

It would be helpful if you gave the exact code that causes the error about expecting done; the code you show is not the code that gives that error. — Jonathan Leffler, Mar 28 '15 at 21:20

score 2 · Answer 1 · edited May 23 '17 at 10:24

2

The code you showed is:

#!/bin/bash

count=0
for i in $@; do
  if [ grep '$1' $i ];then
     ((count++))
  fi
done
echo "$1 $count"

When I run it, I get the error:

script.sh: line 5: [: $1: binary operator expected

This is reasonable, but it is not the same as either of the errors reported in the question. There are multiple problems in the code.

The for i in $@; do should be for i in "$@"; do. Always use "$@" so that any spaces in the arguments are preserved. If none of your file names contain spaces or tabs, it is not critical, but it is a good habit to get into. (See How to iterate over arguments in bash script for more information.)

The if operations runs the [ (aka test) command, which is actually a shell built-in as well as a binary in /bin or /usr/bin. The use of single quotes around '$1' means that the value is not expanded, and the command sees its arguments as:

[
grep
$1
current-file-name
]

where the first is the command name, or argv[0] in C, or $0 in shell. The error I got is because the test command expects an operator such as = or -lt at the point where $1 appears (that is, it expects a binary operator, not $1, hence the message).

You actually want to test whether grep found the word in $1 in each file (the names listed after $1). You probably want to code it like this, then:

#!/bin/bash

word="$1"
shift
count=0

for file in "$@"
do
    if grep -l "$word" "$file" >/dev/null 2>&1
    then ((count++))
    fi
done
echo "$word $count"

We can negotiate on the options and I/O redirections used with grep. The POSIX grep options -q and/or -s options provide varying degrees of silence and -q could be used in place of -l. The -l option simply lists the file name if the word is found, and stops scanning on the first occurrence. The I/O redirection ensures that errors are thrown away, but the test ensures that successful matches are counted.

Incorrect output claimed

It has been claimed that the code above does not produce the correct answer. Here's the test I performed:

$ echo "This country is young" > young.iii
$ echo "This country is little" > little.iii
$ echo "This fruit is fresh" > fresh.txt
$ bash findit.sh country young.iii fresh.txt little.iii
country 2
$ bash -x findit.sh country young.iii fresh.txt little.iii
+ '[' -f /etc/bashrc ']'
+ . /etc/bashrc
++ '[' -z '' ']'
++ return
+ alias 'r=fc -e -'
+ word=country
+ shift
+ count=0
+ for file in '"$@"'
+ grep -l country young.iii
+ (( count++ ))
+ for file in '"$@"'
+ grep -l country fresh.txt
+ for file in '"$@"'
+ grep -l country little.iii
+ (( count++ ))
+ echo 'country 2'
country 2
$

This shows that for the given files, the output is correct on my machine (Mac OS X 10.10.2; GNU bash, version 3.2.57(1)-release (x86_64-apple-darwin14)). If the equivalent test works differently on your machine, then (a) please identify the machine and the version of Bash (bash --version), and (b) please update the question with the output you see from bash -x findit.sh country young.iii fresh.txt little.iii. You may want to create a sub-directory (such as junk), and copy findit.sh into that directory before creating the files as shown, etc.

You could also bolster your case by showing the output of:

$ grep country young.iii fresh.txt little.iii
young.iii:This country is young
little.iii:This country is little
$

edited May 23 '17 at 10:24

Community

1
1

answered Mar 28 '15 at 21:31

Jonathan Leffler

730,956
141
904
1,278

Unfortunately, count is 0 even that word appears in every file – danB Mar 28 '15 at 22:18
Curious: I just tested the second script (copying verbatim from my answer with copy'n'paste) on an Ubuntu 14.04 derivative and got the correct answer for my test `bash findit.sh for *` gave `for 10`, which is correct; 10 of the 14 files in the relevant directory contain the word `for` (one as part of 'transform', one as part of 'before', the rest as a standalone word in a script). What did you do differently, I wonder? What word are you trying? What do you see with `bash -x findit.sh for *`, for example? – Jonathan Leffler Mar 28 '15 at 23:53
Note that [tripleee](http://stackoverflow.com/users/874188/tripleee)'s [answer](http://stackoverflow.com/a/29323057/15168) uses one invocation of `grep` across all files, using a GNU `grep` extension over POSIX options with the `-m1` option. This could be advantageous if there are a lot of small files; it doesn't matter so much if there are lots of big files, especially if the search pattern only occurs near the end. He mentions a problem with counting lines from `grep -l`; that problem does not affect this code because it doesn't look at the output; it looks only at the exit status. – Jonathan Leffler Mar 29 '15 at 00:12
using your program for "findit.sh country young.iii fresh.txt little.iii" displays – danB Mar 29 '15 at 05:13
country 0 instead of country 2 (that means country found in 2 files) – danB Mar 29 '15 at 05:15
@danB: You say that `findit.sh country young.iii fresh.txt little.iii` displays `country 0` instead of `country 2`. That's puzzling! Did you use `bash -x findit.sh country young.iii fresh.txt little.iii` to see what Bash says it is doing as it executes the code? Have you run `grep country young.iii fresh.txt little.iii` to see what it says? – Jonathan Leffler Mar 29 '15 at 05:31
Rather than using double quotes in `for file in "$@"`, I would argue for brevity: `for file; do ...` – William Pursell Mar 29 '15 at 05:40
@WilliamPursell: I prefer clarity to brevity; I always write `for file in "$@"` because no-one can be confused about what it does. – Jonathan Leffler Mar 29 '15 at 05:41
@Jonathan Leffler : "grep country young.iii fresh.txt little.iii" displays the content of files and the word country inside of them is highlighted. And with "bash -x findit.sh country young.iii fresh.txt little.iii" the for loop is stucked on first file, contor remains 0 even if the word appears in that file. – danB Mar 29 '15 at 07:36
Are you sure you've got Bash? Can you add the output of the trace version of running the script. Getting stuck on the first file should mean no result is reported. I'm beginning to think that you are not using copy'n'paste to create the test script, or that you are editing it somehow before you run it. Or you've got a whacky alias set for `grep`. Or something. I have no real interest in what' sin your `.bashrc` file. Mine is minimal, as my example shows. So, at the moment, I have no explanation for what you're seeing, in part because you're not showing what you see. – Jonathan Leffler Mar 29 '15 at 07:46
@JonathanLeffler The version of bash is 'GNU bash, version 4.3.11(1)-release (i686-pc-linux-gnu)'. The script is : "word="$1" shift count=0 for file in "$@" do if grep -l -i "$1" "$i" > /dev/null 2>&1 then ((count++)) fi done echo "$word $count" " and when I write the command bash -x probl3.sh Canyon file1.iii file.iii file1.omulet this appears : + word=Canyon + shift + count=0 + for file in '"$@"' + grep -l -i file1.iii '' + for file in '"$@"' + grep -l -i file1.iii '' + for file in '"$@"' + grep -l -i file1.iii '' + echo 'Canyon 0' Canyon 0 – danB Mar 29 '15 at 08:25
@JonathanLeffler The version of bash is 'GNU bash, version 4.3.11(1)-release (i686-pc-linux-gnu). The script is : word="$1" shift count=0 for file in "$@" do if grep -l -i "$1" "$i" > /dev/null 2>&1 then ((count++)) fi done echo "$word $count " and when I write the command bash -x probl3.sh Canyon file1.iii file.iii file1.omulet this appears : + word=Canyon + shift + count=0 + for file in '"$@"' + grep -l -i file1.iii '' + for file in '"$@"' + grep -l -i file1.iii '' + for file in '"$@"' + grep -l -i file1.iii '' + echo 'Canyon 0' Canyon 0 a form of the previous message – danB Mar 29 '15 at 10:05
You are searching for `$1` instead of `$word`, but the `shift` changed the value that was in `$2` into the value that's now in `$1`. If you run the script under the `-x` option, you will see it looking for the first filename instead of 'country'. Details matter! – Jonathan Leffler Mar 29 '15 at 13:57
@JonathanLeffler Thank you very much. Details matter indeed – danB Mar 29 '15 at 14:19

tripleee · Answer 2 · 2015-03-28T22:51:00.173

2

Try this:

#!/bin/sh
printf '%s %d\n' "$1" $(grep -hm1 "$@" | wc -l)

Notice how all the script's arguments are passed verbatim to grep -- the first is the search expression, the rest are filenames.

The output from grep -hm1 is a list of matches, one per file with a match, and wc -l counts them.

I originally posted this answer with grep -l but that would require filenames to never contain a newline, which is a rather pesky limitation.

Maybe add an -F option if regular expression search is not desired (i.e. only search literal text).

edited Mar 28 '15 at 22:51

answered Mar 28 '15 at 21:53

tripleee

175,061
34
275
318

@JonathanLeffler's excellent answer already points out the problems in your code, do I won't repeat that here. – tripleee Mar 28 '15 at 21:54

Scriptkiddy1337 · Answer 3 · 2015-03-28T22:15:29.733

0

#!/usr/bin/perl
use strict;
use warnings;
my $wordtofind = shift(@ARGV);
my $regex = qr/\Q$wordtofind/s;
my @file = ();
my $count = 0;
my $filescount = scalar(@ARGV);
for my $file(@ARGV)
{
    if(-e $file)
    {
        eval { open(FH,'<' . $file) or die "can't open file $file "; };
        unless($@)
        {
            for(<FH>)
            {
                if(/$regex/)
                {
                     $count++;
                     last;
                }
            }
            close(FH);
        }
    }
}
print "$wordtofind $count\n";

edited Mar 28 '15 at 22:15

answered Mar 28 '15 at 21:13

Scriptkiddy1337

792
4
9

2

This needs some explanation, and is not really an answer to why the shell script isn't working — it is an alternative way of doing the job that the shell script is supposed to do. (I've not evaluated the Perl very much, but the use of the old style file handles and the absence of either `-w` or `use warnings;`, and the absence of `use strict;`, doesn't immediately inspire confidence.) – Jonathan Leffler Mar 28 '15 at 21:17
added `strict` and `warnings` for u. And Explanation? Its same proecdure like bash script: Save it to a file (advantageously ending with ".pl"), `chmod +x` it. I know its not the answer for the exception, but i thought i can maybe help with it. – Scriptkiddy1337 Mar 28 '15 at 21:24
2

If I may suggest, you should at least explain briefly why you would write a Perl script instead of a shell script, to give some context for your answer. The required output appears to be the number of files containing the given word. That means you could stop scanning a given file when you first encounter a match; you'd use `last` (rather than `next`) to exit the loop. However, your code is slurping the entire file into memory, which is expensive if the files are big. There's no obvious benefit to the slurping over simply reading each line in turn. – Jonathan Leffler Mar 28 '15 at 21:49
1

You should check the status from `open` and not use the handle if the open fails. And the open can fail even if the file exists — no permissions, for example. – Jonathan Leffler Mar 28 '15 at 21:59
ok ok u caught me ^^ if fixed it. And a Explaination about that? i don't think so, his First Line was: "I need a simple shell program which has to do something like this" and that did not disqualify me, the Windows Scriptkiddy. – Scriptkiddy1337 Mar 28 '15 at 22:08

score 0 · Answer 4 · answered Mar 29 '15 at 10:46

You could use an Awk script:

#!/usr/bin/env awk -f

BEGIN {
    n=0
} $0 ~ w {
    n++
} END {
    print w,n
}

and run it like this:

./script.awk w=word_to_find file1 file2 file3 ... fileN

or if you don't want to worry about assigning a variable (w) on the command line:

BEGIN {
    n=0
    w=ARGV[1]
    delete ARGV[1]
} $0 ~ w {
    n++
} END {
    print w,n
}

shell script to find a word in a list of files, all of them given as parameters

4 Answers4

Incorrect output claimed