7

I have this find command:

find . -type f  -not -path '**/.git/**' -not -path '**/node_modules/**'  | xargs sed -i '' s/typescript-library-skeleton/xxx/g;

for some reason it's giving me these warnings/errors:

find: ./.git/objects/3c: No such file or directory
find: ./.git/objects/3f: No such file or directory
find: ./.git/objects/41: No such file or directory

I even tried using:

-not -path '**/.git/objects/**'

and got the same thing. Anybody know why the find is searching in the .git directory? Seems weird.

  • 1
    `-name .git -prune` would be much more efficient. Using `-not -path ...` tells `find` to ignore files matching that value, but it doesn't tell it to avoid recursing down the directory. – Charles Duffy May 08 '18 at 17:21
  • See https://stackoverflow.com/questions/37047322/gnu-find-when-does-the-default-action-apply for an example of a question where the default `-print` and an explicit `-print` differ in behavior, re: why `find ...` and `find ... -print` are not actually the same when it comes to negation logic. – Charles Duffy May 08 '18 at 17:22

2 Answers2

7

why is the find searching in the .git directory?

GNU find is clever and supports several optimizations over a naive implementation:

  • It can flip the order of -size +512b -name '*.txt' and check the name first, because querying the size will require a second syscall.
  • It can count the hard links of a directory to determine the number of subdirectories, and when it's seen all it no longers needs to check them for -type d or for recursing.
  • It can even rewrite (-B -or -C) -and -A so that if the checks are equally costly and free of side effects, the -A will be evaluated first, hoping to reject the file after 1 test instead of 2.

However, it is not yet clever enough to realize that -not -path '*/.git/*' means that if you find a directory .git then you don't even need to recurse into it because all files inside will fail to match.

Instead, it dutifully recurses, finds each file and matches it against the pattern as if it was a black box.

To explicitly tell it to skip a directory entirely, you can instead use -prune. See How to exclude a directory in find . command

that other guy
  • 116,971
  • 11
  • 170
  • 194
  • 1
    While it is more directly on-point as an answer for the OP's "why" question, I'm not sure this *is* a complete solution as-given. Someone trying to follow this could easily modify the original question's code to `find . -type f -name .git -prune -name node_modules -prune | xargs ...`, which wouldn't work at all (the `-prune`s won't ever match because of the prior `-type f`; the conditionals are `and`s, not `or`s, and the default `-print` rule has incorrect inferred precedence). More guidance is needed to describe *how to correctly apply* `-prune`. – Charles Duffy May 08 '18 at 19:49
  • @CharlesDuffy If the question is "how do I exclude a directory", then it should arguably be closed as a duplicate instead – that other guy May 08 '18 at 20:27
  • Good call -- I agree that linking to a question that demonstrates the practice suffices. – Charles Duffy May 08 '18 at 20:29
5

Both more efficient and more correct would be to avoid the default -print action, change -not -path ... to -prune, and ensure that xargs is only used with NUL-delimited input:

find . -name .git -prune -o \
       -name node_modules -prune -o \
       -type f -print0 | xargs -0 sed -i '' s/typescript-library-skeleton/xxx/g '{}' +

Note the following points:

  • We use -prune to tell find to not even recurse down the undesired directories, rather than -not -path ... to tell it to discard names in those directories after they were found.
  • We put the -prunes before the -type f, so we're able to match directories for pruning.
  • We have an explicit action, not depending on the default -print. This is important because the default -print effectively has a set of parenthesis: find ... behaves like find '(' ... ')' -print, not like find ... -print, no if explicit action is given.
  • We use xargs only with the -0 argument enabling NUL-delimited input, and the -print0 action on the find side to generate a NUL-delimited list of names. NUL is the only character which cannot be present in an arbitrary file path (yes, newlines can be present) -- and thus the only character which is safe to use to separate paths. (If the -0 extension to xargs and the -print0 extension to find are not guaranteed to be available, use -exec sed -i '' ... {} + instead).
Charles Duffy
  • 280,126
  • 43
  • 390
  • 441
  • Matter of opinion, but I do actually agree that shifting those up looks better. – Charles Duffy May 08 '18 at 17:31
  • I used: `find . -type f -name .git -prune -o -name node_modules -prune -o | xargs sed -i '' s/typescript-library-skeleton/waldo/g;` and got this error `no expression after -o` –  May 08 '18 at 17:35
  • Yes, `-o` means "or". You need to have an action (to be performed if the thing on the left wasn't true) on the right. Is there a reason you aren't using my answer as given? The third bullet point **explicitly** tells you not to rely on the default `-print` action, and the link to a related question I gave in a comment on the question explains in detail how that reliance causes bugs. – Charles Duffy May 08 '18 at 17:38
  • I'm passionate about this debate. `xargs` allows parallelizing the file search and command execution, so with expensive checks or cold directories it can be a significant boost. Additionally, `xargs` often allows parallelizing multiple command invocations, which massively improves independent CPU bound tasks on today's multicore systems – that other guy May 08 '18 at 17:55