437

Could somebody please provide the code to do the following: Assume there is a directory of files, all of which need to be run through a program. The program outputs the results to standard out. I need a script that will go into a directory, execute the command on each file, and concat the output into one big output file.

For instance, to run the command on 1 file:

$ cmd [option] [filename] > results.out
codeforester
  • 39,467
  • 16
  • 112
  • 140
themaestro
  • 13,750
  • 20
  • 56
  • 75
  • 3
    I would like to add to the question. Can it be done using xargs? e.g., `ls | xargs cmd [options] {filenames put in here automatically by xargs} [more arguments] > results.out` – Ozair Kafray May 09 '12 at 20:16
  • 3
    It can, but you probably [don't want to use `ls`](http://mywiki.wooledge.org/ParsingLs) to drive `xargs`. If `cmd` is at all competently written, perhaps you can simply do `cmd `. – tripleee Oct 31 '17 at 04:42

10 Answers10

609

The following bash code will pass $file to command where $file will represent every file in /dir

for file in /dir/*
do
  cmd [option] "$file" >> results.out
done

Example

el@defiant ~/foo $ touch foo.txt bar.txt baz.txt
el@defiant ~/foo $ for i in *.txt; do echo "hello $i"; done
hello bar.txt
hello baz.txt
hello foo.txt
Wayne Werner
  • 49,299
  • 29
  • 200
  • 290
Andrew Logvinov
  • 21,181
  • 6
  • 52
  • 54
  • 37
    If no files exist in `/dir/`, then the loop still runs once with a value of '*' for `$file`, which may be undesirable. To avoid this, enable nullglob for the duration of the loop. Add this line before the loop `shopt -s nullglob` and this line after the loop `shopt -u nullglob #revert nullglob back to it's normal default state`. – Stew-au Sep 19 '12 at 07:38
  • If the output file is the same inside the loop, it's much more efficient to redirect outside the loop `done >results.out` (and probably then you can overwrite instead of append, like I have assumed here). – tripleee Oct 31 '17 at 04:37
  • How do you get individual results files which are custom named to their input files? – Timothy Swan Nov 13 '17 at 14:51
  • @TimothySwan https://stackoverflow.com/questions/28725333/looping-over-pairs-of-values-in-bash – tripleee Jan 21 '18 at 10:13
  • But how do u sequence which file to execute first, second, third because you can generally run few commands from various files in any order. Order / sequence matters. – indianwebdevil Jan 18 '19 at 08:47
  • 3
    be carefull by using this command for huge amount of files in dir. Use find -exec instead. – kolisko Feb 28 '19 at 14:22
  • 2
    "be carefull by using this command for huge amount of files in dir. Use find -exec instead". But why? – That Brazilian Guy Oct 16 '20 at 23:35
  • How would I do it if I want one command for all files joined by space? Like `cmd file1 f2 f3` etc. The files come from a dir. – Timo Nov 22 '20 at 20:23
254

How about this:

find /some/directory -maxdepth 1 -type f -exec cmd option {} \; > results.out
  • -maxdepth 1 argument prevents find from recursively descending into any subdirectories. (If you want such nested directories to get processed, you can omit this.)
  • -type -f specifies that only plain files will be processed.
  • -exec cmd option {} tells it to run cmd with the specified option for each file found, with the filename substituted for {}
  • \; denotes the end of the command.
  • Finally, the output from all the individual cmd executions is redirected to results.out

However, if you care about the order in which the files are processed, you might be better off writing a loop. I think find processes the files in inode order (though I could be wrong about that), which may not be what you want.

Neithan Max
  • 11,004
  • 5
  • 40
  • 58
Jim Lewis
  • 43,505
  • 7
  • 82
  • 96
  • 2
    This is the correct way to process files. Using a for loop is error-prone due to many reasons. Also sorting can be done by using other commands such as `stat` and `sort`, which of-course dependes on what is sorting criteria. – tuxdna Dec 25 '13 at 08:10
  • 2
    if I wanted to run two commands how would I link them after the `-exec` option? Do i have to wrap them in single quotes or something? – frei Nov 20 '17 at 08:28
  • `find` is always the best option cause you can filter by file name pattern with option `-name` and you can do it in a single command. – João Pimentel Ferreira Dec 07 '17 at 18:56
  • 8
    @frei the answer to your question is here: https://stackoverflow.com/a/6043896/1243247 but basically just add `-exec` options: `find . -name "*.txt" -exec echo {} \; -exec grep banana {} \;` – João Pimentel Ferreira Dec 07 '17 at 19:02
  • 3
    how can you reference the file name as option? – Toskan Apr 10 '19 at 16:24
  • chaining `-exec` commands together works fine as long as the first `-exec` exits succesfully. if it fails, second `-exec` won't run. @Toskan the filename is referenced as `{}`. – mazunki Jul 15 '20 at 06:37
  • In a recursive search, without the -maxdepth agrument, if I run a command that produces a file (e.g. convert) how do I keep the new files in the same folder as the original? Also, how do I limit the search to a particular file type? – To Do May 10 '23 at 13:23
123

I'm doing this on my Raspberry Pi from the commandline by running:

for i in *; do cmd "$i"; done
Richie Bendall
  • 7,738
  • 4
  • 38
  • 58
robgraves
  • 1,314
  • 1
  • 8
  • 14
  • 6
    While [this answer](https://stackoverflow.com/a/10523492/114558) is probably the "right" way to do this in a production environment, for day-to-day usage convenience, this one-liner wins! – rinogo Apr 09 '21 at 15:31
  • If one wants to use the modified filename as an argument (e.g. for the name of the output file), you can add anything after the `$i` part, and you will have a new string. Example of an imaginary command `ppp -i raw.txt -o processed.txt` would be: `for i in *; do ppp -i "$i" -o "$i changed"; done` - this will do the `ppp` command on every file and the resulting file for each execution will be named like the input file, with addition of " changed" at the end. – Aleksandar Jan 09 '23 at 10:43
21

You can use xarg:

ls | xargs -L 1 -d '\n' your-desired-command 
  • -L 1 causes pass 1 item at a time

  • -d '\n' splits the output of ls based on new line.

KyleMit
  • 30,350
  • 66
  • 462
  • 664
Al Mamun
  • 944
  • 9
  • 27
  • 1
    Using xargs is nice because it allows you to run your-desired-command in parallel if you add the `-P 8` flag (up to 8 processes at the same time). – Nick Crews Aug 22 '22 at 21:05
  • 2
    For macOS, the `-d` option isn't available. You can fix it by `brew install findutils` first and then use `gxargs` instead of `xargs` – Wit Nov 16 '22 at 08:39
18

The accepted/high-voted answers are great, but they are lacking a few nitty-gritty details. This post covers the cases on how to better handle when the shell path-name expansion (glob) fails, when filenames contain embedded newlines/dash symbols and moving the command output re-direction out of the for-loop when writing the results to a file.

When running the shell glob expansion using * there is a possibility for the expansion to fail if there are no files present in the directory and an un-expanded glob string will be passed to the command to be run, which could have undesirable results. The bash shell provides an extended shell option for this using nullglob. So the loop basically becomes as follows inside the directory containing your files

 shopt -s nullglob

 for file in ./*; do
     cmdToRun [option] -- "$file"
 done

This lets you safely exit the for loop when the expression ./* doesn't return any files (if the directory is empty)

or in a POSIX compliant way (nullglob is bash specific)

 for file in ./*; do
     [ -f "$file" ] || continue
     cmdToRun [option] -- "$file"
 done

This lets you go inside the loop when the expression fails for once and the condition [ -f "$file" ] check if the un-expanded string ./* is a valid filename in that directory, which wouldn't be. So on this condition failure, using continue we resume back to the for loop which won't run subsequently.

Also note the usage of -- just before passing the file name argument. This is needed because as noted previously, the shell filenames can contain dashes anywhere in the filename. Some of the shell commands interpret that and treat them as a command option when the name are not quoted properly and executes the command thinking if the flag is provided.

The -- signals the end of command line options in that case which means, the command shouldn't parse any strings beyond this point as command flags but only as filenames.


Double-quoting the filenames properly solves the cases when the names contain glob characters or white-spaces. But *nix filenames can also contain newlines in them. So we de-limit filenames with the only character that cannot be part of a valid filename - the null byte (\0). Since bash internally uses C style strings in which the null bytes are used to indicate the end of string, it is the right candidate for this.

So using the printf option of shell to delimit files with this NULL byte using the -d option of read command, we can do below

( shopt -s nullglob; printf '%s\0' ./* ) | while read -rd '' file; do
    cmdToRun [option] -- "$file"
done

The nullglob and the printf are wrapped around (..) which means they are basically run in a sub-shell (child shell), because to avoid the nullglob option to reflect on the parent shell, once the command exits. The -d '' option of read command is not POSIX compliant, so needs a bash shell for this to be done. Using find command this can be done as

while IFS= read -r -d '' file; do
    cmdToRun [option] -- "$file"
done < <(find -maxdepth 1 -type f -print0)

For find implementations that don't support -print0 (other than the GNU and the FreeBSD implementations), this can be emulated using printf

find . -maxdepth 1 -type f -exec printf '%s\0' {} \; | xargs -0 cmdToRun [option] --

Another important fix is to move the re-direction out of the for-loop to reduce a high number of file I/O. When used inside the loop, the shell has to execute system-calls twice for each iteration of the for-loop, once for opening and once for closing the file descriptor associated with the file. This will become a bottle-neck on your performance for running large iterations. Recommended suggestion would be to move it outside the loop.

Extending the above code with this fixes, you could do

( shopt -s nullglob; printf '%s\0' ./* ) | while read -rd '' file; do
    cmdToRun [option] -- "$file"
done > results.out

which will basically put the contents of your command for each iteration of your file input to stdout and when the loop ends, open the target file once for writing the contents of the stdout and saving it. The equivalent find version of the same would be

while IFS= read -r -d '' file; do
    cmdToRun [option] -- "$file"
done < <(find -maxdepth 1 -type f -print0) > results.out
Lorenz Meyer
  • 19,166
  • 22
  • 75
  • 121
Inian
  • 80,270
  • 14
  • 142
  • 161
  • 1
    +1 for checking that the file exists. If searching in a non existing dir, $file contains the regex string "/invald_dir/*" not a valid filename. – cdalxndr Feb 19 '20 at 18:14
6

One quick and dirty way which gets the job done sometimes is:

find directory/ | xargs  Command 

For example to find number of lines in all files in the current directory, you can do:

find . | xargs wc -l
Rahul
  • 3,220
  • 4
  • 22
  • 28
  • 8
    @Hubert Why do you have newlines in your filenames?! – musicin3d Dec 01 '18 at 20:37
  • 3
    it's not a question of "why", it's a question of correctness – filenames don't have to include printable characters, they don't even have to be valid UTF-8 sequences. Also, what's a newline is very much encoding dependent, one encodings ♀ is another's newline. See code page 437 – Hubert Kario Dec 07 '18 at 12:23
  • 3
    cmon, really? this does work 99.9% of the time, and he did say "quick and dirty" – Edoardo Jan 24 '19 at 14:22
  • 1
    I am not fan of "quick and dirty" (AKA "broken") Bash scripts. Sooner or later it ends in things like famous "Moved `~/.local/share/steam`. Ran steam. It deleted everything on system owned by user." bug report. – reducing activity Jan 26 '19 at 18:31
  • This also won't work with files that have spaces in the name. – Shamas S Sep 12 '19 at 07:21
1

Based on @Jim Lewis's approach:

Here is a quick solution using find and also sorting files by their modification date:

$ find  directory/ -maxdepth 1 -type f -print0 | \
  xargs -r0 stat -c "%y %n" | \
  sort | cut -d' ' -f4- | \
  xargs -d "\n" -I{} cmd -op1 {} 

For sorting see:

http://www.commandlinefu.com/commands/view/5720/find-files-and-list-them-sorted-by-modification-time

tuxdna
  • 8,257
  • 4
  • 43
  • 61
  • this will not work if the files have newlines in their names – Hubert Kario Nov 12 '18 at 12:04
  • 2
    @HubertKario You may want to read more about `-print0` for `find` and `-0` for `xargs` which use null character instead of any whitespace ( including newlines ). – tuxdna Nov 15 '18 at 04:35
  • yes, using `-print0` is something that helps, but the whole pipeline needs to use something like this, and `sort` isn't – Hubert Kario Nov 15 '18 at 18:02
1

I needed to copy all .md files from one directory into another, so here is what I did.

for i in **/*.md;do mkdir -p ../docs/"$i" && rm -r ../docs/"$i" && cp "$i" "../docs/$i" && echo "$i -> ../docs/$i"; done

Which is pretty hard to read, so lets break it down.

first cd into the directory with your files,

for i in **/*.md; for each file in your pattern

mkdir -p ../docs/"$i"make that directory in a docs folder outside of folder containing your files. Which creates an extra folder with the same name as that file.

rm -r ../docs/"$i" remove the extra folder that is created as a result of mkdir -p

cp "$i" "../docs/$i" Copy the actual file

echo "$i -> ../docs/$i" Echo what you did

; done Live happily ever after

Eric Wooley
  • 646
  • 4
  • 17
1

Maxdepth

I found it works nicely with Jim Lewis's answer just add a bit like this:

$ export DIR=/path/dir && cd $DIR && chmod -R +x *
$ find . -maxdepth 1 -type f -name '*.sh' -exec {} \; > results.out

Sort Order

If you want to execute in sort order, modify it like this:

$ export DIR=/path/dir && cd $DIR && chmod -R +x *
find . -maxdepth 2 -type f -name '*.sh' | sort | bash > results.out

Just for an example, this will execute with following order:

bash: 1: ./assets/main.sh
bash: 2: ./builder/clean.sh
bash: 3: ./builder/concept/compose.sh
bash: 4: ./builder/concept/market.sh
bash: 5: ./builder/concept/services.sh
bash: 6: ./builder/curl.sh
bash: 7: ./builder/identity.sh
bash: 8: ./concept/compose.sh
bash: 9: ./concept/market.sh
bash: 10: ./concept/services.sh
bash: 11: ./product/compose.sh
bash: 12: ./product/market.sh
bash: 13: ./product/services.sh
bash: 14: ./xferlog.sh

Unlimited Depth

If you want to execute in unlimited depth by certain condition, you can use this:

export DIR=/path/dir && cd $DIR && chmod -R +x *
find . -type f -name '*.sh' | sort | bash > results.out

then put on top of each files in the child directories like this:

#!/bin/bash
[[ "$(dirname `pwd`)" == $DIR ]] && echo "Executing `realpath $0`.." || return

and somewhere in the body of parent file:

if <a condition is matched>
then
    #execute child files
    export DIR=`pwd`
fi
eQ19
  • 9,880
  • 3
  • 65
  • 77
-1

i think the simple solution is:

sh /dir/* > ./result.txt
yovie
  • 81
  • 2
  • 3
    Did you understand the question correctly? This will merely try to run each file in the directory through the shell - as if it were a script. – rdas Apr 16 '19 at 08:44