8

I'm running a script that looks at all the files in a directory and its subdirectories.

The script has been running for a day, and I'd like to estimate how long it will keep running. I know how many files it processed so far (73,000,000), but I don't know the total number of files.

What is the fastest way to count the files?

I tried right-clicking on the directory and selecting "properties", and it's slowly counting up. I tried redirecting ls into a file, and it's just churning & churning...

Should I write a program in c?

mattias
  • 2,079
  • 3
  • 20
  • 27
Ada Lovelace
  • 835
  • 2
  • 8
  • 20
  • 1
    Possible duplicate of [Fast Linux File Count for a large number of files](https://stackoverflow.com/questions/1427032/fast-linux-file-count-for-a-large-number-of-files) – phuclv Jun 09 '17 at 10:56

4 Answers4

7

The simplest way:

find <dir> -type f | wc -l

Slightly faster, perhaps:

find <dir> -type f -printf '\n' | wc -l
John Kugelman
  • 349,597
  • 67
  • 533
  • 578
6

I did a quick research. Using a directory with 100,000 files I compared the following commands:

ls -R <dir>
ls -lR <dir>
find <dir> -type f

I ran them twice, once redirecting into a file (>file), and once piping into wc (|wc -l). Here are the run times in seconds:

        >file   |wc
ls -R     14     14
find      89     56
ls -lR    91     82

The difference between >file and |wc -l is smaller than the difference between ls and find.

It appears that ls -R is at least 4x faster than find.

Ada Lovelace
  • 835
  • 2
  • 8
  • 20
  • 1
    I can't undo my upvote, but this is wrong. `ls -R` may be faster, but it also gives incorrect results. First off, this counts directories as well as files. Worse yet, the output is formatted, and those formatting lines are counted as well. – Eric Haynes Jul 13 '18 at 18:23
2

Fastest I know about:

ls | wc -l

Note: keep in mind though that it lists all nodes inside a directory, including subdirectories and the two references to the current and the parent directory (. & ..).

If you need the recursive count of files in all subdirectories (as opposed to everything including subdirectories inside the current directory), then you can add the "recursive" flag to the ls command:

ls -R | wc -l

If you compare this in speed to the suggestion using find you will see that it is much faster (factor 2 to 10), but keep in mind the note above.

arkascha
  • 41,620
  • 7
  • 58
  • 90
  • This misses the files in subdirectories. – R Samuel Klatchko Jun 04 '15 at 19:39
  • @RSamuelKlatchko Yeah, just added that option as a seconf alternative. The OPs question is a bit vague about this... – arkascha Jun 04 '15 at 19:41
  • I tried ls -R, and I'm redirecting it into a file instead of piping into wc, so that I can see how far it got. It's still running. So I was wondering if there's a fastest way. – Ada Lovelace Jun 04 '15 at 19:50
  • Hm, `ls` is _pretty_ fast. Could it be that your performance issue is writing to the file? Keep in mind that writing files is a very slow process... – arkascha Jun 04 '15 at 19:51
  • Sure, I see the problem, but slowing it down further certainly won't help :-) I doubt you can find or implement something really faster than the `ls` command. – arkascha Jun 04 '15 at 20:20
  • `ls` can even be faster than `ls` if you add the `-f` switch. Something faster than the `ls` command? No problem: https://stackoverflow.com/a/28368788/276232 – Christopher Schultz Apr 10 '18 at 17:36
  • `ls -R` has formatting in the output that makes the resulting count incorrect. – Eric Haynes Jul 13 '18 at 18:38
0

ls is not fast at all, and for your purpose is not even valuable: indeed ls prints an alhabetically sorted list of items, so have to wait for the OS to return the whole list of entries, sort them, print in the standard output, an then filter the result looking for newline characters.

A looooot of work for a simple task and even worse: if some of your file has a newline in the name, you'll count it more than once.

find, on the other hand, dosn't sort. It also have the advantage of immediatly executing the actions when the buffer is returned from the File System, so you'll start seeing result immediatly, and will consume far less memory.

So prefer this approach instead:

find . -mindepth 0 -maxdepth 0 -ignore_readdir_race -prinf x | wc -m

It will print an "x" in the standard output for every item found in the current directory (excluded the current directory itself, with -mindepth 1), and don't recurse (-maxdepth 1), then count the chracters.

Given that the folder is very full, -ignore_readdir_race will ignore errors for files deleted while counting

If you want to know the current count, redirect the output to a file (possibly in a tmpfs, so all is in-memory, and you won't produce a bottleneck), then detach the process. When you want to know the current counter's value, simply wc -m /tmp/count.txt:

nohup find . -mindepth 1 -maxdepth 1 -ignore_readdir_race -printf x > /tmp/count.txt &

Then when you want to see the actual count:

wc -m /tmp/count.txt

Or just keep watching it increase...

watch wc -m /tmp/count.txt

Have fun