Recursively find all files that match a certain pattern

Question

I need to find (or more specifically, count) all files that match this pattern:

*/foo/*.doc

Where the first wildcard asterisk includes a variable number of subdirectories.

Does it have to be bash? zsh can do this using the syntax `ls **/foo/*.doc`. — Alastair, Apr 22 '14 at 00:06
Alastair, thanks for the suggestion. I was not aware of zsh and its double-asterisk syntax. Interestingly, it appears the resulting expanded argument list is too long for ls (approx. 6000 filenames) and gives an error. — pw222, Apr 22 '14 at 00:19
An internal command like `echo` avoids the `ARG_MAX` problem (argument list too long). [You should not be using `ls` in scripts.](http://mywiki.wooledge.org/ParsingLs) — tripleee, Dec 12 '15 at 16:46
@tripleee Bash v4 supports `**` recursive glob, but you must first `shopt -s globstar`. See https://tiswww.case.edu/php/chet/bash/bashref.html#The-Shopt-Builtin — BitwiseMan, May 24 '17 at 19:42

rici · Accepted Answer · 2015-12-16T14:53:31.663

66

With gnu find you can use regex, which (unlike -name) match the entire path:

find . -regex '.*/foo/[^/]*.doc'

To just count the number of files:

find . -regex '.*/foo/[^/]*.doc' -printf '%i\n' | wc -l

(The %i format code causes find to print the inode number instead of the filename; unlike the filename, the inode number is guaranteed to not have characters like a newline, so counting is more reliable. Thanks to @tripleee for the suggestion.)

I don't know if that will work on OSX, though.

edited Dec 16 '15 at 14:53

answered Apr 22 '14 at 00:03

rici

234,347
28
237
341

2

Append "| wc -l" to the end of this and it's perfect. – pw222 Apr 22 '14 at 00:11
What about just `-printf '0\n'`? We don't really need the inode at all. – Cœur May 02 '18 at 01:12

MonkeyWidget · Answer 2 · 2016-10-03T15:45:48.600

13

how about:

find BASE_OF_SEARCH/*/foo -name \*.doc -type f | wc -l

What this is doing:

start at directory BASE_OF_SEARCH/
look in all directories that have a directory foo
look for files named like *.doc
count the lines of the result (one per file)

The benefit of this method:

not recursive nor iterative (no loops)
it's easy to read, and if you include it in a script it's fairly easy to decipher (regex sometimes is not).

UPDATE: you want variable depth? ok:

find BASE_OF_SEARCH -name \*.doc -type f | grep foo | wc -l

start at directory BASE_OF_SEARCH
look for files named like *.doc
only show the lines of this result that include "foo"
count the lines of the result (one per file)

Optionally, you could filter out results that have "foo" in the filename, because this will show those too.

edited Oct 03 '16 at 15:45

answered Apr 21 '14 at 23:54

MonkeyWidget

956
1
9
19

This works except for the fact that it does not work with a variable subdirectory depth between BASE_OF_SEARCH and foo. Perhaps I wasn't clear enough with that specification. It's close enough though that I was able to accomplish the task I set out to do, so an upvote and thank you. – pw222 Apr 22 '14 at 00:03
You should emphasize that it is not recursive. However, this is often not needed. Then it is a simpe and nice solution. Though it could have performance issues - don't know. – robsch May 31 '16 at 12:32
I've added a feature for your requests – MonkeyWidget Jun 11 '16 at 17:03

score 12 · Answer 3 · answered Apr 30 '18 at 13:42

12

Based on the answers on this page on other pages I managed to put together the following, where a search is performed in the current folder and all others under it for all files that have the extension pdf, followed by a filtering for those that contain test_text on their title.

find . -name "*.pdf" | grep test_text | wc -l

answered Apr 30 '18 at 13:42

Tsitsi_Catto

121
1
4

I managed to find the original post with the answer containing all the info https://unix.stackexchange.com/questions/123440/why-is-my-find-not-recursive – Tsitsi_Catto Apr 30 '18 at 13:49
The `grep` is overkill in this answer. The following will give you the same result: `find . -name "*test_text*.pdf" | wc -l` – apeman Sep 05 '22 at 17:30

score 2 · Answer 4 · answered Apr 22 '14 at 00:00

2

Untested, but try:

find . -type d -name foo -print | while read d; do echo "$d/*.doc" ; done | wc -l

find all the "foo" directories (at varying depths) (this ignores symlinks, if that's part of the problem you can add them); use shell globbing to find all the ".doc" files, then count them.

answered Apr 22 '14 at 00:00

mpez0

2,815
17
12

The `while` loop is fully redundant and somewhat error-prone. Also, the wildcard will not be expanded because it is quoted. Just pipe `find -print` to `wc -l`. However, this will still give the wrong count if a file name contains a newline. – tripleee Dec 12 '15 at 16:00

Recursively find all files that match a certain pattern

4 Answers4

Linked