41

I am new to linux. I have a directory in linux with approx 250,000 files I need to find count of number of files matching a pattern.

I tried using following command :

ls -1 20061101-20131101_kh5x7tte9n_2010_* | wc -l

I got the following error message:

-bash: /bin/ls: Argument list too long
0

Please help. Thanks in advance

fedorqui
  • 275,237
  • 103
  • 548
  • 598
db1
  • 2,939
  • 3
  • 15
  • 13

7 Answers7

71

It might be better to use find for this:

find . -name "pattern_*" -printf '.' | wc -m

In your specific case:

find . -maxdepth 1 -name "20061101-20131101_kh5x7tte9n_2010_*" -printf '.' | wc -m

find will return a list of files matching the criteria. -maxdepth 1 will make the search to be done just in the path, no subdirectories (thanks Petesh!). -printf '.' will print a dot for every match, so that names with new lines won't make wc -m break.

Then wc -m will indicate the number of characters which will match the number of files.


Performance comparation of two possible options:

Let's create 10 000 files with this pattern:

$ for i in {1..10000}; do touch 20061101-20131101_kh5x7tte9n_201_$i; done

And then compare the time it takes to get the result with ls -1 ... or find ...:

$ time find . -maxdepth 1 -name "20061101-20131101_kh5x7tte9n_201_*" | wc -m
10000

real    0m0.034s
user    0m0.017s
sys     0m0.021s

$ time ls -1 | grep 20061101-20131101_kh5x7tte9n_201 | wc -m
10000

real    0m0.254s
user    0m0.245s
sys     0m0.020s

find is x5 times faster! But if we use ls -1f (thanks Petesh again!), then ls is even faster than find:

$ time ls -1f | grep 20061101-20131101_kh5x7tte9n_201 | wc -m
10000

real    0m0.023s
user    0m0.020s
sys     0m0.012s
fedorqui
  • 275,237
  • 103
  • 548
  • 598
  • 2
    to prevent recursing into subdirectories, you could use `-maxdepth 1` (if it's supported in that version of find) – Anya Shenanigans Jan 15 '14 at 16:39
  • 3
    ls has the bad habit of sorting before outputting, you should test with `ls -1 -f` to get a similar behaviour as find for performance evaluation – Anya Shenanigans Jan 15 '14 at 16:53
  • Pretty interesting, @Petesh, didn't know about it. I have tested the performance and to me with `ls -1f` it was even faster than `find`. – fedorqui Jan 16 '14 at 10:56
  • 3
    If you use the `-printf '.'` trick, you should count characters (`wc -m`) not lines. Alternatively, add a newline after the dot (`-printf '.\n'`). – sp00n Feb 20 '19 at 18:30
  • How about using `--count` (`-c`) for `grep` and skipping `wc`? I would expect performance gain. (And also a simpler expression.) Then again, for the same reasons, I would expect `find` with `-name` to be faster than `ls|grep` while apparently it is not... – Adam Badura Mar 30 '20 at 09:29
  • you should correct your answer to remove the wc -l everywhere and replace with wc -m – matthias_buehlmann Feb 22 '21 at 18:41
  • @matthias_buehlmann oh, good one. Updated, thanks. Feel free to [edit] if you find other things to improve. – fedorqui Feb 22 '21 at 19:25
6

you got "argument too long" because shell expands your pattern to the list of files. try:

find  -maxdepth 1 -name '20061101-20131101_kh5x7tte9n_2010_*' |wc -l

please pay attention - pattern is enclosed in quotes to prevent shell expansion

Odobenus Rosmarus
  • 5,870
  • 2
  • 18
  • 21
4

The MacOS / OS X command line solution

If you are attempting to do this in the command line on a Mac you will soon find out that find does not support the -printf option.

To accomplish the same result as the solution proposed by fedorqui-supports-monica try this:

find . -name "pattern_*" -exec stat -f "." {} \; | wc -l

This will find all files matching the pattern you entered, print a . for each of them in a newline, then finally count the number of lines and output that number.

Using find to count matching filenames in MacOS and OS X

To limit your search depth to the current directory, add -maxdepth 1 to the command like so:

find . -maxdepth 1 -name "196288.*" -exec stat -f "." {} \; | wc -l
Tal Ater
  • 1,121
  • 1
  • 10
  • 17
4

Just do:

find . -name "pattern_*" |wc -l
Oscar
  • 759
  • 7
  • 6
2

Try this:

ls -1 | grep 20061101-20131101_kh5x7tte9n_2010_ | wc -l
Dale
  • 61
  • 1
  • 6
1

You should generally avoid ls in scripts and in fact, performing the calculation in a shell function will avoid the "argument list too long" error because there is no exec boundary and so the ARGV_MAX limit doesn't come into play.

number_of_files () {
    if [ -e "$1" ]; then
        echo "$#"
    else
        echo 0
    fi
}

The conditional guards against the glob not being expanded at all (which is the default out of the box; in Bash, you can shopt -s nullglob to make wildcards which don't match any files get expanded into the empty string).

Try it:

number_of_files 20061101-20131101_kh5x7tte9n_2010_*
tripleee
  • 175,061
  • 34
  • 275
  • 318
-3
ls -1 | grep '20061101-20131101_kh5x7tte9n_2010_*' | wc -l

Previous answer did not included quotes around search criteria neither * wildcard.

Jas
  • 11
  • 3
  • 1
    This is basically a repeat of a previous answer plus it won't work. – blm Nov 07 '15 at 08:28
  • This is confusing shell wildcards and regular expressions. `grep` supports the latter, and will find a match on any substring, so the trailing wildcard is unnecessary, and also doesn't mean what you think. I support the idea that you should generally use quoting around your regexes, but in this particular case, it's not necessary, and the incorrect regex ruins the answer. For the record, the wildcard `*` (which mustn't be quoted) corresponds to the regex `.*` – tripleee Dec 27 '18 at 06:45