You can use the map()
built-in function to create a hash with the file names as the keys and setting the values to undef
or 1
or some other more useful value:
perl -E 'map { $filehash{$_} = undef }
qx( find ./ -type f -maxdepth 3 2>/dev/null ) ;
say keys %filehash ;'
For example you could use the file extension as the value for each hash key (grabbing the extension with fileparse()
from File::Basename
):
perl -MFile::Basename -E '
map { chomp; $filehash{$_} = ( fileparse($_, qr/\..[^.]*$/))[2] }
qx( find ./ -type f -maxdepth 3 2>/dev/null ) ;
say "$_ has $filehash{$_} extension" for keys %filehash ;'
which you could then filter with:
perl -MFile::Basename -E '
map { chomp; $files_ext{$_} = ( fileparse($_, qr/\..[^.]*$/))[2] }
qx( find ./ -type f -maxdepth 3 2>/dev/null ) ;
for $k (keys %files_ext) { say $k if $files_ext{$k} eq ".pdf" } ;'
You could then rewrite this as a script:
use v5.22;
use File::Basename ;
use List::Util 'any';
my %files_ext ;
my @ext = qw(.doc .xls) ;
my @list = qx( find ./ -type f -maxdepth 3 2>/dev/null ) ;
map {
chomp;
$files_ext{$_} = ( fileparse($_, qr/\..[^.]*$/))[2]
} @list ;
for my $k (keys %files_ext) {
say $k if (any { $_ eq $files_ext{$k} } @ext ) ;
}
But, rather than building a hash to filter files this way you could use one of the various modules that will help you find files using perl without running a system command to do so. e.g. File::Find
which comes with the core perl distribution. From CPAN one of my favorites is Path:::Iterator::Rule
. Since your question asks how to add the output of find
to a hash my answer focuses on that approach.
Here is a script that uses Path::Iterator::Rule
to find files and then filters the results as above.
use File::Basename ;
use List::Util 'any';
use Path::Iterator::Rule;
my @exts = qw(.doc .xls);
my $rule = Path::Iterator::Rule->new()->max_depth(3);
my @dirs = $rule->all( "." ) ;
for my $file ( @dirs ) {
if ( any { $_ eq ( fileparse($file, qr/\..[^.]*$/))[2] } @exts ) {
print "$file \n" ;
}
}
On a large set of files it may be possible to make this faster (see the PERFORMANCE section of the Path::Iterator::Rule
documentation) by replacing ->all()
method with ->all_fast()
or shifting the filtering portion (i.e. the calls to any()
and fileparse()
) into a custom rule that uses an anonymous subroutine sub{ ...}
to directly construct the filtered list of files.
Using the "lazy" iterator methods ->iter()
or ->iter_fast()
instead of the list interface also seems to help:
use File::Basename;
use List::Util 'any';
use Path::Iterator::Rule;
my @exts = qw(.doc .xls);
my $rule = Path::Iterator::Rule->new()->max_depth(3);
$rule->and(
sub {
my $ext = ( fileparse($_, qr/\..[^.]*$/))[2];
any { $_ eq $ext } @exts;
}
);
my $next = $rule->iter_fast(".");
while (defined(my $file = $next->())) {
print "$file\n";
}
On my system using a system call to Unix find()
is fastest of all. Fast isn't always "best" though. The perl modules can give you error handling and safety that you otherwise would not get by simply slurping the output of a system command.
Other References
Finding files with Perl has some good responses to a more specific File::Find
question and has some good links in the Related section.
Your question is also more generally about data structures in perl so you probably want to read the "Perl Data Structures Cookbook" documentation available on your system as perldoc perldsc
.