Reading Unix find(1) output into a hash rather than array

Question

Learning on my own but wanted to know how to put the output of a UNIX find command into a hash instead of an array.

I know this works:

@file_array= qx(find / -path '/{directory_path}/*' -type f -maxdepth 3 
               -name "{extension list}" 2>/dev/null );

But I want to do something like this:

$variable = qx(find / -path '/{directory_path}/*' -type f -maxdepth 3
              -name "{extension list}" 2>/dev/null);
              $hash_file{$some_extension} = $variable;

I am new to perl (just started learning) but any help would be greatly appreciated.

Is there any particular reason why you need to call `find`? `File::Find` and `File::Find::Rule` are quite effective. — Sobrique, Mar 20 '16 at 21:31
`$variable = qx(...)` will put the whole of the response string into the scalar variable instead of separated into lines as it is in the case of the array. `$hash_file{$some_extension} = $variable` will just copy that string to a hash element, depending on the value of `$some_extension`. But it's unclear what you're actually trying to do. What do you want in your hash? I'm guessing you're trying to filter the list of names by extension? — Borodin, Mar 21 '16 at 04:13
Yes Borodin that is exactly what I am trying to do. Any pointers on how that can be done? — Milania, Mar 21 '16 at 11:46

score 1 · Answer 1 · edited Mar 23 '16 at 01:32

Here is quick & dirty tip

#!/usr/bin/env perl

use strict;
use warnings;

use File::Find;
use Data::Dumper;

my $Input      = shift @ARGV;
my @SuffixList = qw(\.txt \.doc \.xls \.csv);
my $Suffixes   = join '|', @SuffixList;
my $Sources    = {};
my $MyDepth    = 5;

find({
    wanted   => sub { my $Depth = tr!/!!; 
                     push @{ $Sources->{$1} }, $_ if ($_ =~ m{($Suffixes)\z}xms 
                     && $Depth < $MyDepth) 
                },
    no_chdir => 1,
}, $Input);

print Dumper $Sources;

The trick with tr is from here.

score -1 · Answer 2 · edited May 23 '17 at 12:07

You can use the map() built-in function to create a hash with the file names as the keys and setting the values to undef or 1 or some other more useful value:

perl -E 'map { $filehash{$_} = undef } 
         qx( find ./ -type f -maxdepth 3  2>/dev/null ) ; 
         say keys %filehash ;'

For example you could use the file extension as the value for each hash key (grabbing the extension with fileparse() from File::Basename):

perl -MFile::Basename -E '
         map { chomp; $filehash{$_} = ( fileparse($_, qr/\..[^.]*$/))[2] } 
         qx( find ./ -type f -maxdepth 3 2>/dev/null ) ;
         say "$_  has $filehash{$_}  extension" for keys %filehash  ;'

which you could then filter with:

perl -MFile::Basename -E ' 
         map { chomp; $files_ext{$_} = ( fileparse($_, qr/\..[^.]*$/))[2] }
         qx( find ./ -type f -maxdepth 3 2>/dev/null ) ;
         for $k (keys %files_ext) { say $k if $files_ext{$k} eq ".pdf" } ;'

You could then rewrite this as a script:

use v5.22;
use File::Basename ;
use List::Util 'any';

my %files_ext ;
my @ext  = qw(.doc .xls) ;
my @list =  qx( find ./ -type f -maxdepth 3 2>/dev/null ) ;

map { 
   chomp; 
   $files_ext{$_} = ( fileparse($_, qr/\..[^.]*$/))[2] 
} @list ;

for my $k (keys %files_ext) {   
        say $k if (any { $_ eq $files_ext{$k} } @ext ) ;
}

But, rather than building a hash to filter files this way you could use one of the various modules that will help you find files using perl without running a system command to do so. e.g. File::Find which comes with the core perl distribution. From CPAN one of my favorites is Path:::Iterator::Rule. Since your question asks how to add the output of find to a hash my answer focuses on that approach.

Here is a script that uses Path::Iterator::Rule to find files and then filters the results as above.

use File::Basename ;
use List::Util 'any';
use Path::Iterator::Rule;

my @exts = qw(.doc .xls);    
my $rule = Path::Iterator::Rule->new()->max_depth(3);

my @dirs = $rule->all( "." ) ;

for my $file ( @dirs ) {
  if ( any { $_ eq ( fileparse($file, qr/\..[^.]*$/))[2] } @exts ) {
    print "$file \n" ;
  }
}

On a large set of files it may be possible to make this faster (see the PERFORMANCE section of the Path::Iterator::Rule documentation) by replacing ->all() method with ->all_fast() or shifting the filtering portion (i.e. the calls to any() and fileparse()) into a custom rule that uses an anonymous subroutine sub{ ...} to directly construct the filtered list of files.

Using the "lazy" iterator methods ->iter() or ->iter_fast()instead of the list interface also seems to help:

use File::Basename;
use List::Util 'any';
use Path::Iterator::Rule;

my @exts = qw(.doc .xls);   
my $rule = Path::Iterator::Rule->new()->max_depth(3);  

$rule->and(
  sub {
    my $ext = ( fileparse($_, qr/\..[^.]*$/))[2];
    any { $_ eq $ext } @exts;
  }
);

my $next = $rule->iter_fast(".");

while (defined(my $file = $next->())) {
  print "$file\n";
}

On my system using a system call to Unix find() is fastest of all. Fast isn't always "best" though. The perl modules can give you error handling and safety that you otherwise would not get by simply slurping the output of a system command.

Other References

Finding files with Perl has some good responses to a more specific File::Find question and has some good links in the Related section.
Your question is also more generally about data structures in perl so you probably want to read the "Perl Data Structures Cookbook" documentation available on your system as perldoc perldsc.

Reading Unix find(1) output into a hash rather than array

2 Answers2