-1

Quarrying out from another thread Move file that has aged x minutes, this question came up:

How does the find command found typically in Linux search for files in the current directory?

Consider a directory that contains a fairly large amount of files, then:

Firstly find MY_FILE.txt returns immediately and secondly find . -name MY_FILE.txt takes much longer.

I used strace -c to see what happens for both and I learned that the second command invokes a directory scan, which explains why it's slower.

So, the first command must be optimized. Can anybody point me to the appropriate resource or provide a quick explanation how this might be implemented?

John Kugelman
  • 349,597
  • 67
  • 533
  • 578
stephanmg
  • 746
  • 6
  • 17
  • Despite being not really useful, in the thread I referenced, people used `find` on a single file to find out if the file is older than e.g. 10 minutes. – stephanmg Nov 13 '19 at 09:14
  • 1
    The `find` command is in the [GNU findutils](https://www.gnu.org/software/findutils/) package. – jww Nov 13 '19 at 10:42

1 Answers1

1

The syntax for find is find <paths> <expression>, where paths is a list of files and directories to start the search from. find starts from those locations and then recurses (if they're directories).

When you write find . -name MY_FILE.txt it performs a recursive search under the ./ directory. But if you write find MY_FILE.txt then you're telling it to start the search at ./MY_FILE.txt, and so it does:

$ strace -e file find MY_FILE.txt
...
newfstatat(AT_FDCWD, "MY_FILE.txt", 0x556688ecdc68, AT_SYMLINK_NOFOLLOW) = -1 ENOENT (No such file or directory)
...
(No such file or directory)
: No such file or directory
+++ exited with 1 +++

Since the path doesn't exist, it only takes a single system call to determine that there's no such file. It calls newfstat(), gets a No such file or directory error, and that's that.

In other words, find MY_FILE.txt isn't equivalent to find . -name MY_FILE.txt. Heck, I wouldn't even call it useful because you're not asking it to search. You're just asking it to tell you if MY_FILE.txt exists in the current directory or not. But you could find that out by simply calling ls MY_FILE.txt.

Here's the difference:

[~]$ cd /usr
[/usr]$ find . -name sha384sum
./bin/sha384sum
[/usr]$ find sha384sum
find: ‘sha384sum’: No such file or directory

The first one performs a recursive search and finds /usr/bin/sha384sum. The second one doesn't recurse and immediately fails bcause /usr/sha384sum doesn't exist. It doesn't look any deeper. It's done in a nanosecond.

John Kugelman
  • 349,597
  • 67
  • 533
  • 578
  • I see, so it is a single system call. I would then have to look for the implementation of this `system call` to see how it actually works I guess. – stephanmg Nov 13 '19 at 09:05