1

I have a directory which contains multiple levels of sub dirs. I want to print path for each and every directory. Currently, I am using

use File::Find; 

find( 
{ 
    wanted => \&findfiles, 
}, $maindirectory); 
  
sub findfiles 
{ 
      if (-d) {
     push @arrayofdirs,$File::Find::dir;
     }   
}

But each subdirectory contains thousands of files at each level. The above code takes lot of time to provide the result as it compares each file for directory. Is there a way to get subdirectories path without comparing files to save time or any other optimized method?

Edit: This issue got partially resolved but a new issue came up because of this solution. I have listed it here: Multiple File search in varying level of directories in perl

  • 3
    The POSIX (and libc) function readdir does not provide the file type for each directory entry and thus one need to do an explicit or implicit (as in `-d`) stat(2) on the name. Using the getdents(2) function this could be optimized, but is specific for Linux. See [using Linux getdents syscall](https://www.perlmonks.org/?node_id=1148448). – Steffen Ullrich Jul 17 '20 at 19:24
  • If you are on a UNIX/Linux platform then you can try reading output of `find $maindirectory -type d` into your program (see `perldoc -f qx`). That would be faster because a compiled C program (`find`) will be doing all the hard work. Something like `@dirs = split /\0/, qx(find $maindirectory -type d -print0);` should work. – pii_ke Jul 17 '20 at 20:34
  • Also the sample code you gave saves the container directory of each directory to the array, not the directory itself. I think this is not intended. You should use `$File::Find::name` if you wish to save the directory name. – pii_ke Jul 17 '20 at 20:41
  • 2
    I'm curious, how many files/levels/directories are we talking about? It runs for me in under half a second on a ~3Gb hierarchy with close to 5k subdirectories. NOTE: you don't want `$File::Find::dir` there but rather `$File::Find::name`. – zdim Jul 17 '20 at 20:42
  • Here I am talking about 20 levels each having about a hundred thousand sub-directories. Each sub-directory may contain about 70 files. – Apurva Choudhary Jul 20 '20 at 13:15
  • One of the problems you likely have is that the more things a directory contains, the slower things get. https://serverfault.com/questions/147731/do-large-folder-sizes-slow-down-io-performance – brian d foy Jul 24 '20 at 19:31

1 Answers1

3

If you are on a UNIX/Linux platform then you can try reading output of find $maindirectory -type d command into your program (see this answer for a safe way to do that.). This command prints the names of directories in $maindirectory. It is faster because a compiled C program (find) does all the hard work. The following script should print all directory paths found.

Sample script:

use strict;
use warnings;

my $maindirectory = '.';
open my $fh, '-|', 'find', $maindirectory, '-type', 'd' or die "Can't open pipe: $!";
while( my $dir = <$fh>) {
    print $dir;
}
close $fh or warn "can't close pipe: $!";

Note that there is no point in calling find through perl and then just printing its output without any processing. You can just as well run find $maindirectory -type d in shell itself.

pii_ke
  • 2,811
  • 2
  • 20
  • 30
  • 1
    Have you tested/benchmarked to verify that there is a non-negligible difference in run time by using `find`? If the majority of the Perl program's time is spent doing `stat` calls (which are handled by a compiled C library, and likely the same library used by `find`), then it's very possible that there will be little or no difference. – Dave Sherohman Jul 18 '20 at 08:22
  • 1
    @Dave I did some basic testing using `time`. Script using the sample given in question (after some correction) gives `real *0m1.623s* user 0m1.030s sys 0m0.590s`. while the script in my answer gives `real *0m0.493s* user 0m0.300s sys 0m0.150s`. Both on a directory having 870 directories in it. I think my approach is a faster because it avoids the `wanted` function call for each directory entry. – pii_ke Jul 18 '20 at 09:36
  • 1
    OK, yes, I'd say that's a non-negligible difference. Thanks! – Dave Sherohman Jul 18 '20 at 09:40