0

I am opening a directory that has files that look like the following. Here is one file:

    >UVWXY
    ABCDEFGHIJKLMNOPQRSTUVWXYZ
    >STUVW
    ABCDEFGHIJKLMNOPQRSTUVWXYZ
    >QRSTU
    ABCDEFGHIJKLMNOPQRSTUVWXYZ 

Here is a second file:

   >EFGHI
   ABCDEFGHIJKLMNOPQRSTUVWXYZ 

Here is my code:

   #!/usr/bin/perl
   use warnings;
   use strict;

   my ($directory) = @ARGV;
   my $dir = "$directory";
   my @ArrayofFiles = glob "$dir/*";

   open(OUT, ">", "/path/to/output.txt") or die $!;

   foreach my $file(@ArrayofFiles){
          open(my $fastas, $file) or die $!;
          my $numberoffastas = grep{/>/}<$fastas>;
          #print $numberoffastas, "\n"; 
          while (my $line = <$fastas>){
                 print $line, "\n";
          }
    }

Nothing is printed out for $line, but this code correctly counts the number of ">"s that appear in the file when it is opened, evidenced by printing $numberoffastas.
How can I fix this code so that $line = something like:

     >EFGHI 

or

    ABCDEFGHIJKLMNOPQRSTUVWXYZ  

Thanks

Ed Morton
  • 188,023
  • 17
  • 78
  • 185
Rob
  • 175
  • 12
  • Do you need to print (or to have) the number of `>` before printing the lines? – Casimir et Hippolyte Jul 22 '16 at 21:35
  • I am going to set up a conditional. When the number of > equals the number of lines that are processed per file, I am going to print out a unique line that signifies the end of the information from that file. All data will go into a single file, so having the information per file clearly delimitated is necessary. – Rob Jul 22 '16 at 21:39
  • So you need the number of `>` *only after* you write all the lines to the file. In this case use a single loop where you increment the number of `>` *(when there is one)* and where you print the current line to the output file. You don't need to parse the entire file twice. – Casimir et Hippolyte Jul 22 '16 at 21:59
  • If I ask another question about this, would you be willing to answer with a demonstration? – Rob Jul 22 '16 at 22:05
  • Why not, but I need to be sure to well understand what you are trying to do. – Casimir et Hippolyte Jul 22 '16 at 22:07
  • I will try my best to be clear. Thanks. – Rob Jul 22 '16 at 22:07
  • @Rob: There is no need to count the number of header lines before processing your file. You can print your summary line after the `while` statement, when all of the data has been read from the file. – Borodin Jul 23 '16 at 16:41

1 Answers1

2
my $numberoffastas = grep{/>/}<$fastas>;

calls readline on the $fastas filehandle in list context, which consumes all the input on the filehandle. At your subsequent call in while (my $line = <$fastas>), there is no more input on that filehandle to provide, and the while condition fails.

Save the inputs in an array and perform both operations on the array

my @inp = <$fastas>;
my $numberoffastas = grep {/>/} @inp;
...
foreach my $line (@inp) {
   ...
}

or if you are worried that the files are too large and will give you memory headaches, reopen the file

my $numberoffastas = grep {/>/} <$fastas>;
close $fastas;
open $fastas, $file;
...
while (my $line = <$fastas>) { ... }

or seek to the beginning of the file

open my $fastas, '+<', $file;    #   +<  means random-access mode
my $numberoffastas = grep {/>/} <$fastas>;
...
seek $fastas, 0, 0;              #   rewind to beginning of file
while (my $line = <$fastas>) { ... }
mob
  • 117,087
  • 18
  • 149
  • 283