130

Which one is more efficient over a very large set of files and should be used?

find . -exec cmd {} +

or

find . | xargs cmd

(Assume that there are no funny characters in the filenames)

codeforester
  • 39,467
  • 16
  • 112
  • 140
dogbane
  • 266,786
  • 75
  • 396
  • 414

3 Answers3

122

Speed difference will be insignificant.

But you have to make sure that:

  1. Your script will not assume that no file will have space, tab, etc in file name; the first version is safe, the second is not.

  2. Your script will not treat a file starting with "-" as an option.

So your code should look like this:

find . -exec cmd -option1 -option2 -- {} +

or

find . -print0 | xargs -0 cmd -option1 -option2 --

The first version is shorter and easier to write as you can ignore 1, but the second version is more portable and safe, as "-exec cmd {} +" is a relatively new option in GNU findutils (since 2005, lots of running systems will not have it yet) and it was buggy recently. Also lots of people do not know this "-exec cmd {} +", as you can see from other answers.

codeforester
  • 39,467
  • 16
  • 112
  • 140
Tometzky
  • 22,573
  • 5
  • 59
  • 73
  • 5
    -print0 is also a GNU find (and GNU xargs) option which is missing from a lot of non-Linux systems, so the portability argument isn't as valid. Using just -print and leaving the -0 off of xargs, however, *is* very portable. – dannysauer May 27 '09 at 20:30
  • 7
    The point is that without -print0 it does not work if there is a file with a space or tab etc. This can be a security vulnerability as if there is a filename like "foo -o index.html" then -o will be treated as an option. Try in empty directory: "touch -- foo\ -o\ index.html; find . | xargs cat". You'll get: "cat: invalid option -- 'o'" – Tometzky May 28 '09 at 07:22
  • 2
    His example is a filename that contains a -. Without -print0, find will spit out ./foo -o index.html. So maybe starting with a - isn't a big deal, but the result is little changed, and on a multiuser system, could provide an attack vector if your script is world readable. – bobpaul Feb 02 '12 at 17:47
  • 2
    A note on something which tripped me up here - using `exec` will output results as they are found, wheras `xargs` will, it seems, wait until the entire directory is searched before writing to stdout. If you're trying this on a large directory, and it seems that `xargs` isn't working, patience is advisable. – FarmerGedden Sep 20 '13 at 10:19
  • Regarding '--': http://unix.stackexchange.com/questions/147143/when-and-how-was-the-double-dash-introduced-as-an-end-of-options-delimiter – Roland Jan 28 '15 at 11:02
  • Since `find` always outputs filenames with a path (e.g. `./file`) I don't see why `--` would be needed in this case. It's true though that because of possible newline characters the zero-delimiter settings are necessary. – phk Mar 13 '16 at 15:39
  • @phk - What do you mean by `zero-delimiter settings`? – Motivated Feb 03 '19 at 06:29
  • @Tometzky - Do you mean to say that i would have to run the command `find . -exec cmd -- {} +` to avoid expanding `-` as an option? If i am unaware that there are files that precede with `-` e.g. `-file1.txt`, the answer seems to suggests that i would need to use `-- {}`. – Motivated Feb 03 '19 at 06:32
  • @Tometzky - What is the purpose of `-0` in the command `xargs -0`? – Motivated Feb 03 '19 at 06:36
  • @bobpaul - What do you mean by it could provide an attack vector? – Motivated Feb 03 '19 at 06:38
  • @Motivated files on most unices can have any byte in name (a tab, a horozontal tab, space, minus, quote etc). The only byte which isn't allowed is byte `0`. The `-print0` option for find and `-0` for xargs forces filesnames to be separated with byte 0. You're right about `--`, although modern GNU find never produces filenames starting with `-` it a good idea in general to separate options and arguments in case find will be replaced with some other program. – Tometzky Feb 03 '19 at 15:50
  • @Tometzky - Thanks. What do you mean by "forces file names to be separated with a byte 0? Why is `-print0` not used with the first `find` command? To clarify the use of `--`, do you mean to say that it's always best to include it in the command? – Motivated Feb 03 '19 at 16:06
  • 2
    @Motivated Without `-print0` find returns filenames separated with newline, but newline can also be part of a filename, making it ambiguous. Byte 0 can't, so it is a safe separator. Yes - adding `--` to a command that supports it is a good practice when you can't control its arguments, even if not always strictly required or unsafe. – Tometzky Feb 04 '19 at 19:41
  • @Tometzky - Thanks. Do you mean to say that if i use the command find . -exec echo {} + without the use of --print0 or -- it would result in filenames separated by newlines as well as filesnames that begin with -? – Motivated Feb 05 '19 at 18:24
  • @Tometzky - To minimize the number of comments, it would be great if you can join (https://chat.stackexchange.com/rooms/89297/find-exec-xargs) – Motivated Feb 05 '19 at 18:27
  • @Motivated I'm pretty sure back in 2012 when I commented there was another comment between @tometzky's comment and my own. the -- in the example should be generally safe, but tricky filenames can still cause unexpected behavior. Here's a pastebin showing where things can go wrong without using `-print0`: https://pastebin.com/3irfB3tW – bobpaul Feb 06 '19 at 20:33
9
find . | xargs cmd

is more efficient (it runs cmd as few times as possible, unlike exec, which runs cmd once for each match). However, you will run into trouble if filenames contain spaces or funky characters.

The following is suggested to be used:

find . -print0 | xargs -0 cmd

this will work even if filenames contain funky characters (-print0 makes find print NUL-terminated matches, -0 makes xargs expect this format.)

SamB
  • 9,039
  • 5
  • 49
  • 56
ASk
  • 4,157
  • 1
  • 18
  • 15
  • 33
    This is not "find . -exec cmd {} \;" but "find . -exec cmd {} +". The latter will not run one file at a time. – Tometzky May 22 '09 at 08:47
  • 2
    Note that the `xargs` approach is actually significantly slower if there are no (or only a few) matching files and `cmd` doesn't have much to do for each file. For example, when run in an empty directory, the `xargs` version will take at least twice the time, since two processes must be started instead of just one. (Yes, the difference is usually imperceptible on *nix, but in a loop it could be important; or, try it on Windows some time ...) – SamB Jan 26 '14 at 23:05
2

Modern xargs's versions often support parallel pipeline execution.

Obviously it might be a pivot point when it comes to choice between find … -exec and … | xargs

poige
  • 1,562
  • 15
  • 12