4

I have a folder named Lib and I am using the File::Find module to search that folder in whole dir say, D:\. It's taking a long time to search, say even 5 mins if the drive has a lot of subdirectories. How can I search that Lib faster so it will be done in seconds?

My code looks like this:

    find( \&Lib_files, $dir);
    sub Lib_files
    {
       return unless -d;
      if ($_=~m/^([L|l]ib(.*))/)
      {
          print"$_";
      }
      return;
    }
brian d foy
  • 129,424
  • 31
  • 207
  • 592
User1611
  • 1,081
  • 4
  • 18
  • 27

2 Answers2

21

Searching the file system without a preexisting index is IO bound. Otherwise, products ranging from locate to Windows Desktop Search would not exist.

Type D:\> dir /b/s > directory.lst and observe how long it takes for that command to run. You should not expect to beat that without indexing files first.

One major improvement you can make is to print less often. A minor improvement is not to use capturing parentheses if you are not going to capture:

my @dirs;

sub Lib_files {
   return unless -d $File::Find::name; 
   if ( /^[Ll]ib/ ) {
        push @dirs, $File::Find::name;
   }
   return;
}

On my system, a simple script using File::Find to print the names of all subdirectories under my home directory with about 150,000 files takes a few minutes to run compared to dir %HOME% /ad/b/s > dir.lst which completes in about 20 seconds.

I would be inclined to use:

use File::Basename;

my @dirs = grep { fileparse($_) =~ /^[Ll]ib/ }
           split /\n/,  `dir %HOME% /ad/b/s`;

which completed in under 15 seconds on my system.

If there is a chance there is some other dir.exe in %PATH%, cmd.exe's built-in dir will not be invoked. You can use qx! cmd.exe /c dir %HOME% /ad/b/s ! to make sure that the right dir is invoked.

Sinan Ünür
  • 116,958
  • 15
  • 196
  • 339
  • 4
    +1 for not using capturing parentheses - but on the whole, it is likely to be a second-order effect compared to disk access time. – Jonathan Leffler Nov 23 '09 at 15:34
  • @Jonathan Leffler: Yeah, I should correct how I phrased that because the most important savings will come from not printing so often. However, it will be hard to beat `qx'dir d:\ /ad/b/s'` for this kind of thing. – Sinan Ünür Nov 23 '09 at 15:40
  • Don't you mean `/[Ll]ib/`? `[L|l]` is (roughly) equivalent to `(?:L|\||l)`, not `(?:L|l)`. – Robert P Nov 23 '09 at 21:27
  • @Robert P: My mistake illustrates well the perils of copy & paste without thinking. Thank you for the correction. – Sinan Ünür Nov 23 '09 at 22:28
  • Hi can u give me the exact code because the above code displays "file not found" when i executed – User1611 Nov 24 '09 at 06:04
  • 1
    @lokesh That is the exact code I ran. **You** need to **replace** `%HOME%` with whatever directory you are trying to search. – Sinan Ünür Nov 24 '09 at 12:01
  • hi...am getting the result " File not found " when i run the script use File::Basename; my @dirs = grep { fileparse($_) =~ /^[L|l]ib/ } split /\n/, `dir e:\\/ad/b/s`; print @dirs; what may be the prob? – User1611 Dec 01 '09 at 05:49
-1

how about not using File::Find module

use Cwd;
sub find{
    my ($wdir) = shift;
    my ($sdir) = &cwd; 
    chdir($wdir) or die "Unable to enter dir $wdir:$!\n";
    opendir(DIR, ".") or die "Unable to open $wdir:$!\n";
    foreach my $name (readdir(DIR) ){
        next if ($name eq ".");
        next if ($name eq "..");
        if (-d $name){
            &find($name);
            next;
        }

        print $name ."\n";
        chdir($sdir) or die "Unable to change to dir $sdir:$!\n";
    }
    closedir(DIR);
}
&find(".");
ghostdog74
  • 327,991
  • 56
  • 259
  • 343
  • One way to optimise this would be to store each directory name in a local variable so that you can close the directory handle before recursing. The example given, unfortunately, holds the each directory handle open as it inspects sub-directories and if you have a very deep path you could be starving your system of handles. – PP. Nov 27 '09 at 11:30
  • **1)** Don't use `&` to invoke subs unless you know why. See http://perldoc.perl.org/perlsub.html **2)** `$name =~ /^\.\.?\z/` is perfectly fine. **3)** Did you time your code? Why do you think this would be faster? – Sinan Ünür Dec 01 '09 at 15:50
  • let me tell you why 1) & is for historical reason. In the old times, I wrote it like that.! 2) I just want to use string comparison and not regex, what's your problem? 3) Did you time mine? 4) because its not calling the find module but using Perl's internal construct. Reasonable? – ghostdog74 Dec 01 '09 at 23:24