Looping over files in UNIX

Question

Im trying to loop over all the files in a directory, whose name is given as a command line argument (e.g. myfolder). For each file the grep command should run on the folder, and count the number of times the phrase (e.g. myphrase) is found in the text file.

When I run my code, similar to below, I get the error "No such file or directory". I've tried calling the script using ./myscript.sh myfolder and ./myscript.sh /[fullpath]/myfolder and they both result in the same error.

for f in "$1"
do
  echo "processing $f file"
  grep -o '<myphrase>' "$f" | wc -l
done

Any ideas as to what's going wrong? If it helps the script is being run from within the same folder as the text files, and the command must be called with the folder name as an argument - both annoying requirements I must follow.

Edit: running ls -ld for this folder gives drwxr-xr-x@ 829 user staff 28186 7 Feb 17:19 my folder

Use: `for f in "$1"/*` and then check if `$f` is a file using `[[ -f $f ]]` — anubhava, Feb 10 '16 at 16:36
I think it must be to do with the argument, as I added your suggestion but it still comes back with no file/directory error — jsc, Feb 10 '16 at 16:41
You script should be placed in parent directory of `myfolder` — anubhava, Feb 10 '16 at 17:26
Actually -- are you sure this isn't a DOS-format text file (the very first "before you ask" thing to check listed in the SO bash tag wiki)? If your shebang (that is, the `#!/bin/sh` or `#!/bin/bash` line at the top) is telling the system to run it with an interpreter named `/bin/bash$'\r'` because it has a CRLF newline, then it's no surprise that it would give a "file not found" when run. — Charles Duffy, Feb 10 '16 at 17:41
...re: that tag wiki, see http://stackoverflow.com/tags/bash/info — Charles Duffy, Feb 10 '16 at 17:42
Also, in general: Instead of saying "code similar to X gives behavior Y", actually *test your simplified code*, and give its *exact* error message (copied-pasted with all necessary context; a `file not found` coming from grep is very different than one coming from the OS kernel when trying to `exec` your script). — Charles Duffy, Feb 10 '16 at 17:44

score 2 · Answer 1 · edited May 23 '17 at 12:07

2

As anubhava mentioned above:

In your for loop use:

for f in "$1"/*
do
...
done

You can then check if f is a file with:

[[ -f $f ]]

And perform your necessary logic inside the loop:

[[ -f $f ]] && grep -o '<myphrase>' "$f" | wc -l

So in summary:

for f in "$1"/*
do
 echo "processing $f file"
 [[ -f $f ]] && grep -o '<myphrase>' "$f" | wc -l
done

edited May 23 '17 at 12:07

Community

1
1

answered Feb 10 '16 at 16:39

Caleb Adams

4,445
1
26
26

2

Do you mean to say `[[ -f $f ]] && grep -o '' "$f" | wc -l` above? Your `if` is fine, but why drop `[[` in favor of `[` ? – David C. Rankin Feb 10 '16 at 16:40
still no luck I'm afraid! Is it the way I'm running the script? ./scriptname.sh myfolder – jsc Feb 10 '16 at 17:10
@jsc, if you run `bash -x ./scriptname myfolder`, is the output enlightening? – Charles Duffy Feb 10 '16 at 17:40
@jsc what shebang are you currently using? – Caleb Adams Feb 10 '16 at 17:56
@user3704230, ...specifically, a shebang with a DOS newline on the end would have it trying to read `/bin/sh$'\r'` or `/bin/bash$'\r'` or such, thus a "file not found". And if it *is* a CRLF newline, the OP may not know that it's there. – Charles Duffy Feb 10 '16 at 17:58
good point @CharlesDuffy. There is a good answer about how to fix the DOS carriage returns here: [Stackoverflow removing DOS carriage returns](http://stackoverflow.com/questions/2613800/how-to-convert-dos-windows-newline-crlf-to-unix-newline-n-in-bash-script) - jsc you may want to check this out – Caleb Adams Feb 10 '16 at 18:01
running the `bash -x ./scriptname myfolder` gives the error `grep: myfolder/*: No such file or directory` `0` it looks like its seeing the * symbol as a file itself maybe. the folder does contain about 800 .dat files which should be readable, or at least they were using just the grep line before I added the looping functionality – jsc Feb 10 '16 at 19:05
@jsc, if that was the exact error text you got with your original code, you should have included that text with your original question. – Charles Duffy Feb 10 '16 at 21:27

jgreve · Accepted Answer · 2016-02-10T18:15:01.950

You need to add a wildcard match to your target directory.
Compare the two for loops in foo.sh ("original" is commented out).

Also I modified the grep command to just be an echo so you have an
easy way to preview what it is going to attempt to execute.

sample output from foo.sh

edit: I added -r (below, in foo.sh) to check for read permission on dir $1. That would be hard to trouble shoot; when I tested against a no-read directory it just ran looking like nothing was in the directory (even though it had a *.java and a *.class file, as shown above).

To be more clear, running against a non-readable dir looks like this (this is without the -r check):

$ chmod a-r tmp
$ ./foo.sh tmp
$1="tmp"
processing tmp/* file
grep -o '<myphrase>' "tmp/*" | wc -l
$

Note the "processing tmp/* file" line above. The for loop is feeds the literal characters "tmp/*" into the $f variable.
Which is fine, that is exactly how wildcard expansion is supposed to work if the pattern doesn't match anything.
But we're not checking for error codes from grep, so it could be hard
to notice grep complaining about "file tmp/* not found".

foo.sh output (revised)

$ chmod a+r tmp
$ ./foo.sh
$1=""
Error: no directory specified.
$ ./foo.sh foo.sh
$1="foo.sh"
Error: "foo.sh" is not a dir.
$ ./foo.sh tmp
$1="tmp"
processing tmp/Foo.class file
grep -o '<myphrase>' "tmp/Foo.class" | wc -l
processing tmp/Foo.java file
grep -o '<myphrase>' "tmp/Foo.java" | wc -l
$ chmod a-r tmp
$ ./foo.sh tmp
$1="tmp"
Error: no read permissions on dir "tmp".
$

foo.sh

#!/bin/bash

echo "\$1=\"$1\""
if [ -z "$1" ]; then
   echo "Error: no directory specified."
   exit 1
fi
if [ ! -e "$1" ]; then
   echo "Error: dir \"$1\" does not exist."
   exit 1
fi
if [ ! -d "$1" ]; then
   echo "Error: \"$1\" is not a dir."
   exit 1
fi
if [ ! -r "$1" ]; then
   echo "Error: no read permissions on dir \"$1\"."
   exit 1
fi
# maybe default to "." if $1 is empty ?
# original: for f in "$1"
for f in "$1"/*
do
  echo "processing $f file"
  echo "grep -o '<myphrase>' \"$f\" | wc -l"
  # maybe  change myphrase to $2 ?
done

Also, as pointed out elsewhere, you are writing your own version of the "find" command. Another possibility follows...

So what about find ?

I'm adding this just to encourage you to look at the find command at some point.

Disclaimer: Writing your own script is perfectly ok. There is value in the research and understanding how to roll-your-own.

find is indeed complex, but it is well worth climbing find's learning curve for the long run.

Note that 99% of the following is just comments.

ezfind.sh (an example)

#!/bin/bash

# example ussage:
# ezfind.sh "$HOME/my_dir"  "foo.*bar"
#  "$1" is the start point, any directory path (relative or absolute).
#  -type f limits matches to regular files (e.g. probably dont
#  want to run grep dirs or devices).
#  exec args ar funky, see below.
#  Optional: see bottom of this script for notes about -depth
#  to limit how deep find will search.
#------------------------------------------------------
find "$1" -type f -exec grep -i -o   "$2" '{}' ';'
#                       \________/  \___/ \__/ \_/
#                           |        |     |   |
#   command ----------------+        |     |   |
#   pattern for grep ----------------+     |   |
#   find replaces {} w/filename------------+   |
#   find exepects a semicolon for end-of-cmd---+

# Quoting is funky for -exec arguments.
# The values have to to survive current bash interpretation,
# so they can be passed to find's argument list.
# Then find turns around and passes them to grep's argument list.
# The semicolon is normally a bash statement separator so we
# need to quote it (or escape it) so it gets passed to find
# as part of the arg list.

# find -exec will replace {} with current filename.
# find gives you a crazy amount of file name, file type and date range options.
#    search for all file names in /what/ever matching "*.txt"
#    search for all file names matching "*.sh" modified in the last hour.
# for more on find, see here:
# http://www.softpanorama.org/Tools/Find/using_exec_option_and_xargs_in_find.shtml
# 
# What if I don't want to search every sub-folder, all the way down?
# Also about not traversing subdirectories, consider the -depth modifer.
# To just search the target directory, modify the above to read:
#     find "$1" -maxdepth 0  ....
# To just search the target directory and immediate subdirectories...
#     find "$1" -maxdepth 1  ....
# For a nice summary of -depth, see here:
#     http://www.tech-recipes.com/rx/31/limit-the-depth-of-search-using-find/

thank you for the kind words. I don't want to undermine your script effort, absolutely keep plugging on that and make it work. You can read up on find some day later. — jgreve, Feb 10 '16 at 17:54
Really nice comprehensive response - thank you for that! And i will look into find, it's just i used this method for another part of this task so keeping it the same where possible. Have managed to get the command to recognise the folder now, however am receiving this error, it seems like it is trying to read `*` as a file almost? `processing myfolder/* file` `grep: myfolder/*: No such file or directory` `0` — jsc, Feb 10 '16 at 17:55
Sorry for deleting comment and re-replying, new to this site and didn't realise return would post rather than let me start a new line :p — jsc, Feb 10 '16 at 17:56
ah, funny you mention "myfolder/*: no such file or directory". I just added some code to check for no read perimissions on the dir. Can you do an "ls -ld myfolder" and edit your question to show the results. ( -l for long listing, shows permissions, and -d to prevent ls from shown myfolder's contents... i want to start with myfolder itself). — jgreve, Feb 10 '16 at 18:12
also you could try using a full path to myfolder, e.g. /home/jsc/myfolder or wherever it is. May be more stable to run that way when invoking your script, doesn't require caller to cd to a particular location first. — jgreve, Feb 10 '16 at 18:21
ls on myfolder returns no such file or directory, but this, I think, is because i am located in the folder itself (with the script too). running from its parent folder gives: `drwxr-xr-x@ 829 JosephCarden staff 28186 7 Feb 17:19 myfolder` — jsc, Feb 10 '16 at 18:22
@jsc, *of course* paths are relevant to the current working directory. If it doesn't work for `ls`, it won't work for your script. So, if you used `../myfolder`, you'd be fine. — Charles Duffy, Feb 10 '16 at 18:32
problem is the script must be inside myfolder. and annoyingly the command must be called using `./myscript.sh myfolder` — jsc, Feb 10 '16 at 18:35
Moving the script outside of the folder works now. However the problem i face is that my specification requires me to run it from inside — jsc, Feb 10 '16 at 19:52
how aboubt just "." instead of myfolder, e.g.: $ ./myscript.sh . — jgreve, Feb 10 '16 at 20:06
@jsc, I promise you that whoever wrote the specification did not intend you to take all those components of it literally (ie. did not mean you to need to run it with `myfolder` as an argument from inside of `myfolder`). I'll literally place money on it (PayPal? Google Pay? Your choice). Send an email to the TA, CC me, if they respond that they really did intend that usage, I'll pay up. — Charles Duffy, Feb 10 '16 at 21:28
@jsc, ...mind you, by the above, I *literally* mean `myfolder`, as opposed to `.` or `../myfolder` or `/path/from/the/root/to/myfolder`, all of which make sense. — Charles Duffy, Feb 10 '16 at 23:01
@CharlesDuffy: I agree, sounds like the spec is broken. The only possible way I can imagine this working is with a subdirectory in myfolder called "myfolder". e.g. you had something like /path/from/the/root/to/myfolder/myfolder. So jsc, is this for work or for a class? — jgreve, Feb 11 '16 at 00:15
Just an update for you guys. The script does now work from outside the folder. And my professor has changed the spec to make clear that the script should run from the level outside the folder. So the problem was in part due to a badly written spec. Thanks for all your patience with a newbie - much appreciated! — jsc, Feb 11 '16 at 19:02
Glad to hear it worked out. Poorly written specs are more common than not, so that may be the real take home lesson. Happy scripting. :-) — jgreve, Feb 11 '16 at 21:38

score 0 · Answer 3 · answered Feb 16 '16 at 18:51

0

From the Directory details, it seems that there is a space in the directory name:

drwxr-xr-x@ 829 user staff 28186 7 Feb 17:19 my folder

try to run the script as below or remove the space from the directory name :)

./myscript.sh "my folder"

answered Feb 16 '16 at 18:51

Mayuresh

43
1
5

Maxim Egorushkin · Answer 4 · 2016-02-10T17:02:58.557

-1

You can just do:

find <directory> -type f -exec egrep -c '<regular expression>' {} +

Note that there is no need to check for file existence because find does not find non-existing files. Also note that find traverses sub-directories found (tunable behaviour with -maxdepth 1).

E.g.:

find /usr/include -type f -exec egrep -c 'int128' {} +

edited Feb 10 '16 at 17:02

answered Feb 10 '16 at 16:53

Maxim Egorushkin

131,725
17
180
271

1

Using `find` is wrong if there are subdirectories which are not supposed to be traversed. This can be worked around, but the simplest workaround is to not use `find`. Also, you haphazardly replaced the `grep -o | wc -l` with `egrep -c`; they do not do the same thing (`grep -c` counts the number of matching *lines,* not how many times the actual string occurs, and `egrep` additionally uses a different regex dialect). So I'm downvoting for too many unrelated and possibly erroneous changes. – tripleee Feb 10 '16 at 17:01