0

I am working on a directory listing project and I need to capture all the files on the computer and then store them in a queue, which will then be sent off for worker threads to do work on.

Right now I am using this example code of nftw():

#define _XOPEN_SOURCE 500
#include <ftw.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <stdint.h>

static int
display_info(const char *fpath, const struct stat *sb,
             int tflag, struct FTW *ftwbuf)
{
    printf("%-3s %2d %7jd   %-40s %d %s\n",
        (tflag == FTW_D) ?   "d"   : (tflag == FTW_DNR) ? "dnr" :
        (tflag == FTW_DP) ?  "dp"  : (tflag == FTW_F) ?   "f" :
        (tflag == FTW_NS) ?  "ns"  : (tflag == FTW_SL) ?  "sl" :
        (tflag == FTW_SLN) ? "sln" : "???",
        ftwbuf->level, (intmax_t) sb->st_size,
        fpath, ftwbuf->base, fpath + ftwbuf->base);
    return 0;           /* To tell nftw() to continue */
}

int
main(int argc, char *argv[])
{
    int flags = 0;

   if (argc > 2 && strchr(argv[2], 'd') != NULL)
        flags |= FTW_DEPTH;
    if (argc > 2 && strchr(argv[2], 'p') != NULL)
        flags |= FTW_PHYS;

   if (nftw((argc < 2) ? "." : argv[1], display_info, 20, flags)
            == -1) {
        perror("nftw");
        exit(EXIT_FAILURE);
    }
    exit(EXIT_SUCCESS);
}

I have noticed that it starts out very fast and then dies off pretty quick and each 1000 files looped through takes roughly 7 seconds. I am looking for a way to increase the speed for this function.

Sean Bright
  • 118,630
  • 17
  • 138
  • 146
Doritos
  • 403
  • 3
  • 16
  • 1
    How much time takes `find` or `ls -R` on the same directory? [glibc/ftw.c:nftw()](https://github.com/bminor/glibc/blob/master/io/ftw.c#L539) internally just calls `readdir`, you could call `readdir` yourself. But the usually case is just I/O takes the longest. – KamilCuk Jan 10 '20 at 17:50
  • 1
    Running find on the root directory '/' completes in less than 2 minutes. The nftw() takes over 10 minutes before I stopped it because it wasn't finished. – Doritos Jan 10 '20 at 17:53
  • I found the issue. It always happens after I ask a question.... But the issue is adding things to my queue. I removed the queue and it is now working faster than doing a 'find'. – Doritos Jan 10 '20 at 17:56

1 Answers1

3

In the page that you linked, there is this explanation for this behavior:

To avoid using up all of the calling process's file descriptors, nopenfd specifies the maximum number of directories that ftw() will hold open simultaneously. When the search depth exceeds this, ftw() will become slower because directories have to be closed and reopened. ftw() uses at most one file descriptor for each level in the directory tree.

Sean Bright
  • 118,630
  • 17
  • 138
  • 146
  • I am posting this question with file descriptor value set to 2000. – Doritos Jan 10 '20 at 17:54
  • @Doritos That's a clash between a lot and not enough. You should try finding an optimal number for your case. Maybe search up what `find` uses while you're at it. – S.S. Anne Jan 10 '20 at 18:23