29

I have one script that only writes data to stdout. I need to run it for multiple files and generate a different output file for each input file and I was wondering how to use find -exec for that. So I basically tried several variants of this (I replaced the script by cat just for testability purposes):

find * -type f -exec cat "{}" > "{}.stdout" \;

but could not make it work since all the data was being written to a file literally named{}.stdout.

Eventually, I could make it work with :

find * -type f -exec sh -c "cat {} > {}.stdout" \;

But while this latest form works well with cat, my script requires environment variables loaded through several initialization scripts, thus I end up with:

find * -type f -exec sh -c "initscript1; initscript2; ...;  myscript {} > {}.stdout" \;

Which seems a waste because I have everything already initialized in my current shell.

Is there a better way of doing this with find? Other one-liners are welcome.

phuclv
  • 37,963
  • 15
  • 156
  • 475
jserras
  • 711
  • 1
  • 7
  • 13
  • 3
    If they are initialized in your original shell, but not set in the subshell, then they are not environment variables. Write `set -a` at the top of your initscripts. – William Pursell Feb 22 '13 at 18:23
  • 1
    Is the last example you give correct, or is the command:`find . -type f -exec sh -c ". initscript1; . initscript2; ...; myscript {} > {}.stdout" \; ` (Instead of simply invoking `initscript1`, are you actually calling `. initscript1`, ie you are sourcing the file with the dot command). – William Pursell Feb 22 '13 at 18:30
  • See also https://superuser.com/questions/1327969/appending-new-lines-to-multiple-files/1327980#1327980 – mems Nov 04 '19 at 15:43

3 Answers3

23

You can do it with eval. It may be ugly, but so is having to make a shell script for this. Plus, it's all on one line. For example

find -type f -exec bash -c "eval md5sum {}  > {}.sum " \;
Phillip Jones
  • 247
  • 2
  • 2
  • 4
    The `bash -c` is the beef here, the `eval` isn't actually doing anything useful. But you are not avoiding the shell. – tripleee Mar 21 '17 at 16:00
  • If you take out the `eval` I'm thinking this should be the accepted answer actually, even though the OP would lilke to avoid a shell. (Putting a script in a separate file creates a shell when running that script anyway. What the OP is asking isn't really possible.) – tripleee Mar 21 '17 at 16:02
  • 1
    The `eval` is actively dangerous here. If you have a file name that contains `$(rm -rf $HOME)`, this is going to be **very** bad news. – Charles Duffy Apr 05 '17 at 17:27
  • @tripleee, even without the `eval` this is still dangerous, because you're running your filenames through `bash -c`. *With* the `eval`, you're evaluating each filename through the shell parser twice; without it, you're evaluating it once. The only acceptable number of times data is parsed as code is from a security perspective *zero*. – Charles Duffy Apr 05 '17 at 19:44
  • 2
    So a secure rephrase would be `find -type f -exec bash -c 'for f; do md5sum "$f" >"$f.sum"; done' _ +` but that won't avoid the shell either, of course (and basically duplicates the existing answer by @CharlesDuffy). – tripleee Apr 06 '17 at 04:23
9

A simple solution would be to put a wrapper around your script:

#!/bin/sh

myscript "$1" > "$1.stdout"

Call it myscript2 and invoke it with find:

find . -type f -exec myscript2 {} \;

Note that although most implementations of find allow you to do what you have done, technically the behavior of find is unspecified if you use {} more than once in the argument list of -exec.

Charles Duffy
  • 280,126
  • 43
  • 390
  • 441
William Pursell
  • 204,365
  • 48
  • 270
  • 300
  • 2
    But in `find` manual, somewhere in `-exec` it is said that: _The string '{}' is replaced by the current file name being processed everywhere it occurs in the arguments to the command, not just in arguments where it is alone, as in some versions of find._ [link](http://unixhelp.ed.ac.uk/CGI/man-cgi?find). Still, thanks for the workaround. – jserras Feb 22 '13 at 22:14
  • 4
    The manual for your particular implementation of `find` state that it works, but the standard reads: `If more than one argument containing only the two characters "{}" is present, the behavior is unspecified.` It's not a big deal, but is something that can burn you (at which point it suddenly becomes a big deal!) – William Pursell Feb 22 '13 at 22:32
  • 3
    A more important disadvantage is that things like `-exec sh -c "myscript {} > {}.stdout" \;` can cause arbitrary code execution in the face of hostile file names. It is more secure to do `-exec sh -c 'myscript "$1" > "$1.stdout"' sh {} \;`. – jilles Feb 22 '13 at 23:50
  • Had someone come up with a very similar question today, and I was disappointed to find that I couldn't find anything I really *liked* as a duplicate for it; hence the effort. If this question were in its current state a few hours ago (the `eval` approach not tied with the accepted answer, and the accepted answer not having the now-fixed quoting bug), I probably would have just closed the other as dupe. – Charles Duffy Apr 05 '17 at 18:04
4

If you export your environment variables, they'll already be present in the child shell (If you use bash -c instead of sh -c, and your parent shell is itself bash, then you can also export functions in the parent shell and have them usable in the child; see export -f).

Moreover, by using -exec ... {} +, you can limit the number of shells to the smallest possible number needed to pass all arguments on the command line:

set -a # turn on automatic export of all variables
source initscript1
source initscript2

# pass as many filenames as possible to each sh -c, iterating over them directly
find * -name '*.stdout' -prune -o -type f \
  -exec sh -c 'for arg; do myscript "$arg" > "${arg}.stdout"' _ {} +

Alternately, you can just perform the execution in your current shell directly:

while IFS= read -r -d '' filename; do
  myscript "$filename" >"${filename}.out"
done < <(find * -name '*.stdout' -prune -o -type f -print0)

See UsingFind discussing safely and correctly performing bulk actions through find; and BashFAQ #24 discussing the use of process substitution (the <(...) syntax) to ensure that operations are performed in the parent shell.

Charles Duffy
  • 280,126
  • 43
  • 390
  • 441
  • 1
    Using `_` as $0 to the invoked sh is a bit obfuscating! – William Pursell Apr 05 '17 at 17:50
  • 1
    @WilliamPursell, it's a common idiom -- can find links if you like. (`_` is also a conventional unused/placeholder value in some other languages, such as Python, but my understanding is that it was common in shell first). – Charles Duffy Apr 05 '17 at 18:01
  • I've seen it used in go and perl, but never in this setting. I tend to ignore it and set $0 to {}, which is probably a much worse practice! – William Pursell Apr 05 '17 at 18:40