3

In perl, I read in files from a directory, and I want to open them all simultaneously (but line by line) so that I can perform a function that uses all of their nth lines together (e.g. concatenation).

my $text = `ls | grep ".txt"`;
my @temps = split(/\n/,$text);
my @files;
for my $i (0..$#temps) {
  my $file;
  open($file,"<",$temps[$i]);
  push(@files,$file);
}
my $concat;
for my $i (0..$#files) {
  my @blah = <$files[$i]>;
  $concat.=$blah;
}
print $concat;

I just a bunch of errors, use of uninitialized value, and GLOB(..) errors. So how can I make this work?

innaM
  • 47,505
  • 4
  • 67
  • 87
Jeremy
  • 31
  • 1
  • 1
  • 2
  • 9
    **`Always`** put `use strict; use warnings;` at the beginning of a Perl program. Do that until you know exactly why you should do that. – Brad Gilbert Sep 30 '09 at 18:55
  • 4
    "Do that until you know exactly why you should do that." And then keep doing it unless you have a very specific reason not to and know exactly what will happen because of it. – Chris Lutz Sep 30 '09 at 19:18

4 Answers4

15

A lot of issues. Starting with call to "ls | grep" :)

Let's start with some code:

First, let's get list of files:

my @files = glob( '*.txt' );

But it would be better to test if the given name relates to file or directory:

my @files = grep { -f } glob( '*.txt' );

Now, let's open these files to read them:

my @fhs = map { open my $fh, '<', $_; $fh } @files;

But, we need a way to handle errors - in my opinion the best way is to add:

use autodie;

At the beginning of script (and installation of autodie, if you don't have it yet). Alternatively you can:

use Fatal qw( open );

Now, that we have it, let's get the first line (as you showed in your example) from all of the inputs, and concatenate it:

my $concatenated = '';

for my $fh ( @fhs ) {
    my $line = <$fh>;
    $concatenated .= $line;
}

Which is perfectly fine, and readable, but still can be shortened, while maintaining (in my opinion) readability, to:

my $concatenated = join '', map { scalar <$_> } @fhs;

Effect is the same - $concatenated contains first lines from all files.

So, whole program would look like this:

#!/usr/bin/perl
use strict;
use warnings;
use autodie;
# use Fatal qw( open ); # uncomment if you don't have autodie

my @files        = grep { -f } glob( '*.txt' );
my @fhs          = map { open my $fh, '<', $_; $fh } @files;
my $concatenated = join '', map { scalar <$_> } @fhs;

Now, it might be that you want to concatenate not just first lines, but all of them. In this situation, instead of $concatenated = ... code, you'd need something like this:

my $concatenated = '';

while (my $fh = shift @fhs) {
    my $line = <$fh>;
    if ( defined $line ) {
        push @fhs, $fh;
        $concatenated .= $line;
    } else {
        close $fh;
    }
}
  • +1 your code is better than mine. I would love to maintain this kind of code. Although for completeness it might be noted that `glob()` is considered a somewhat unsafe function, and might not work universally. I can't find a reference for that (you might search StackOverflow and see if you can find anything about it - I remember it from a comment, but don't know where to look at this point). – Chris Lutz Sep 30 '09 at 18:40
  • @Chris: Hmm .. never heard about it, but it's possible. In this case - opendir, readdir + grep, closedir should be sufficient. –  Sep 30 '09 at 19:37
  • I think that the complaints about `glob` refer to older versions of the function. (It used to use the C-shell?) Nevertheless, here's one Perl coder who doesn't like it and why: http://sial.org/blog/2008/01/many_small_errors.html – Telemachus Sep 30 '09 at 20:33
  • glob, with a fixed string argument, on Perl >= 5.6, is quite safe. The only convincing argument in that blog post is that spaces may cause trouble, but `"*.txt"` is known not to contain spaces. :) – hobbs Sep 30 '09 at 20:38
  • @Paul - Have you tried reading other comments? Like - mine comment from 3 hours ago: " ... In this case - opendir, readdir + grep, closedir should be sufficient" :) –  Sep 30 '09 at 22:42
  • @Paul: A few things worth saying for `glob`: (1) no dotfiles (unless you ask for them), (2) order is guaranteed, (3) no need to prepend the directory on the results in order to do filetests. – Telemachus Sep 30 '09 at 23:30
8

Here is your problem:

for my $i (0..$#files) {
  my @blah = <$files[$i]>;
  $concat .= $blah;
}

First, <$files[$i]> isn't a valid filehandle read. This is the source of your GLOB(...) errors. See mobrule's answer for why this is the case. So change it to this:

for my $file (@files) {
  my @blah = <$file>;
  $concat .= $blah;
}

Second problem, You're mixing @blah (an array named blah) and $blah (a scalar named blah). This is the source of your "uninitialized value" errors - $blah (the scalar) hasn't been initialized, but you're using it. If you want the $n-th line from @blah, use this:

for my $file (@files) {
  my @blah = <$file>;
  $concat .= $blah[$n];
}

I don't want to keep beating a dead horse, but I do want to address a better way to do something:

my $text = `ls | grep ".txt"`;
my @temps = split(/\n/,$text);

This reads in a list of all files in the current directory that have a ".txt" extension in them. This works, and is effective, but it can be rather slow - we have to call out to the shell, which has to fork off to run ls and grep, and that incurs a bit of overhead. Furthermore, ls and grep are simple and common programs, but not exactly portable. Surely there's a better way to do this:

my @temps;
opendir(DIRHANDLE, ".");
while(my $file = readdir(DIRHANDLE)) {
  push @temps, $file if $file =~ /\.txt/;
}

Simple, short, pure Perl, no forking, no non-portable shells, and we don't have to read in the string and then split it - we can only store the entries we really need. Plus, it becomes trivial to modify the conditions for files that pass the test. Say we end up accidentally reading the file test.txt.gz because our regex matches: we can easily change that line to:

  push @temps, $file if $file =~ /\.txt$/;

We can do that one with grep (I believe), but why settle for grep's limited regular expressions when Perl has one of the most powerful regex libraries anywhere built-in?

Community
  • 1
  • 1
Chris Lutz
  • 73,191
  • 16
  • 130
  • 183
1

Use braces around $files[$i] inside the <> operator

my @blah = <{$files[$i]}>

Otherwise Perl interprets <> as the file glob operator instead of the read-from-filehandle operator.

mob
  • 117,087
  • 18
  • 149
  • 283
1

You've got some good answers already. Another way to tackle the problem is to create a list-of-lists containing all of the lines from the files (@content). Then use the each_arrayref function from List::MoreUtils, which will create an iterator that yields line 1 from all files, then line 2, etc.

use strict;
use warnings;
use List::MoreUtils qw(each_arrayref);

my @content =
    map {
        open(my $fh, '<', $_) or die $!;
        [<$fh>]
    }
    grep {-f}
    glob '*.txt'
;
my $iterator = each_arrayref @content;
while (my @nth_lines = $iterator->()){
    # Do stuff with @nth_lines;
}
FMc
  • 41,963
  • 13
  • 79
  • 132