71

I need to find (or more specifically, count) all files that match this pattern:

*/foo/*.doc

Where the first wildcard asterisk includes a variable number of subdirectories.

jww
  • 97,681
  • 90
  • 411
  • 885
pw222
  • 815
  • 1
  • 6
  • 11
  • 4
    Does it have to be bash? zsh can do this using the syntax `ls **/foo/*.doc`. – Alastair Apr 22 '14 at 00:06
  • Alastair, thanks for the suggestion. I was not aware of zsh and its double-asterisk syntax. Interestingly, it appears the resulting expanded argument list is too long for ls (approx. 6000 filenames) and gives an error. – pw222 Apr 22 '14 at 00:19
  • Bash v4 also supports the `**` recursive glob. – tripleee Dec 12 '15 at 16:43
  • An internal command like `echo` avoids the `ARG_MAX` problem (argument list too long). [You should not be using `ls` in scripts.](http://mywiki.wooledge.org/ParsingLs) – tripleee Dec 12 '15 at 16:46
  • @tripleee Bash v4 supports `**` recursive glob, but you must first `shopt -s globstar`. See https://tiswww.case.edu/php/chet/bash/bashref.html#The-Shopt-Builtin – BitwiseMan May 24 '17 at 19:42

4 Answers4

66

With gnu find you can use regex, which (unlike -name) match the entire path:

find . -regex '.*/foo/[^/]*.doc'

To just count the number of files:

find . -regex '.*/foo/[^/]*.doc' -printf '%i\n' | wc -l

(The %i format code causes find to print the inode number instead of the filename; unlike the filename, the inode number is guaranteed to not have characters like a newline, so counting is more reliable. Thanks to @tripleee for the suggestion.)

I don't know if that will work on OSX, though.

rici
  • 234,347
  • 28
  • 237
  • 341
13

how about:

find BASE_OF_SEARCH/*/foo -name \*.doc -type f | wc -l

What this is doing:

  • start at directory BASE_OF_SEARCH/
  • look in all directories that have a directory foo
  • look for files named like *.doc
  • count the lines of the result (one per file)

The benefit of this method:

  • not recursive nor iterative (no loops)
  • it's easy to read, and if you include it in a script it's fairly easy to decipher (regex sometimes is not).

UPDATE: you want variable depth? ok:

find BASE_OF_SEARCH -name \*.doc -type f | grep foo | wc -l

  • start at directory BASE_OF_SEARCH
  • look for files named like *.doc
  • only show the lines of this result that include "foo"
  • count the lines of the result (one per file)

Optionally, you could filter out results that have "foo" in the filename, because this will show those too.

MonkeyWidget
  • 956
  • 1
  • 9
  • 19
  • This works except for the fact that it does not work with a variable subdirectory depth between BASE_OF_SEARCH and foo. Perhaps I wasn't clear enough with that specification. It's close enough though that I was able to accomplish the task I set out to do, so an upvote and thank you. – pw222 Apr 22 '14 at 00:03
  • You should emphasize that it is not recursive. However, this is often not needed. Then it is a simpe and nice solution. Though it could have performance issues - don't know. – robsch May 31 '16 at 12:32
  • I've added a feature for your requests – MonkeyWidget Jun 11 '16 at 17:03
12

Based on the answers on this page on other pages I managed to put together the following, where a search is performed in the current folder and all others under it for all files that have the extension pdf, followed by a filtering for those that contain test_text on their title.

find . -name "*.pdf" | grep test_text | wc -l
Tsitsi_Catto
  • 121
  • 1
  • 4
  • I managed to find the original post with the answer containing all the info https://unix.stackexchange.com/questions/123440/why-is-my-find-not-recursive – Tsitsi_Catto Apr 30 '18 at 13:49
  • The `grep` is overkill in this answer. The following will give you the same result: `find . -name "*test_text*.pdf" | wc -l` – apeman Sep 05 '22 at 17:30
2

Untested, but try:

find . -type d -name foo -print | while read d; do echo "$d/*.doc" ; done | wc -l

find all the "foo" directories (at varying depths) (this ignores symlinks, if that's part of the problem you can add them); use shell globbing to find all the ".doc" files, then count them.

mpez0
  • 2,815
  • 17
  • 12
  • The `while` loop is fully redundant and somewhat error-prone. Also, the wildcard will not be expanded because it is quoted. Just pipe `find -print` to `wc -l`. However, this will still give the wrong count if a file name contains a newline. – tripleee Dec 12 '15 at 16:00