2

I am setting up a hash reference containing file handles.

The fourth column of my input file contains an identifier field that I am using to name the file handle's destination:

col1    col2    col3    id-0008    col5
col1    col2    col3    id-0002    col5
col1    col2    col3    id-0001    col5
col1    col2    col3    id-0001    col5
col1    col2    col3    id-0007    col5
...
col1    col2    col3    id-0003    col5

I use GNU core utilities to get a list of the identifiers:

$ cut -f4 myFile | sort | uniq
id-0001
id-0002
...

There can be more than 1024 unique identifiers in this column, and I need to open a file handle for each identifier and put that handle into a hash reference.

my $fhsRef;
my $fileOfInterest = "/foo/bar/fileOfInterest.txt";

openFileHandles($fileOfInterest);
closeFileHandles();

sub openFileHandles {                                                                                                                                                                                                              
    my ($fn) = @_;                                                                                                                                                                                                              

    print STDERR "getting set names... (this may take a few moments)\n";                                                                                                                                                           
    my $resultStr = `cut -f4 $fn | sort | uniq`;                                                                                                                                                                 
    chomp($resultStr);                                                                                                                                                                                                             
    my @setNames = split("\n", $resultStr);                                                                                                                                                                                        

    foreach my $setName (@setNames) {                                                                                                                                                                                              
        my $destDir = "$rootDir/$subDir/$setName"; if (! -d $destDir) { mkpath $destDir; }                                                                                                                                          
        my $destFn = "$destDir/coordinates.bed";                                                                                                                                                                                   
        local *FILE;                                                                                                                                                                                                               
        print STDERR "opening handle to: $destFn\n";                                                                                                                                                                               
        open (FILE, "> $destFn") or die "could not open handle to $destFn\n$!\n";                                                                                                                                                  
        $fhsRef->{$setName}->{fh} = *FILE;                                                                                                                                                                                         
        $fhsRef->{$setName}->{fn} = $destFn;                                                                                                                                                                                       
    }                                                                                                                                                                                                                              
}                                                                                                                                                                                                                                  

sub closeFileHandles {                                                                                                                                                                                                             
    foreach my $setName (keys %{$fhsRef}) {                                                                                                                                                                                        
        print STDERR "closing handle to: ".$fhsRef->{$setName}->{fn}."\n";                                                                                                                                                         
        close $fhsRef->{$setName}->{fh};                                                                                                                                                                                           
    }                                                                                                                                                                                                                              
}       

The problem is that my code is dying at the equivalent of id-1022:

opening handle to: /foo/bar/baz/id-0001/coordinates.bed
opening handle to: /foo/bar/baz/id-0002/coordinates.bed
...
opening handle to: /foo/bar/baz/id-1022/coordinates.bed
could not open handle to /foo/bar/baz/id-1022/coordinates.bed
0
6144 at ./process.pl line 66.

Is there an upper limit in Perl to the number of file handles I can open or store in a hash reference? Or have I made another mistake elsewhere?

tshepang
  • 12,111
  • 21
  • 91
  • 136
Alex Reynolds
  • 95,983
  • 54
  • 240
  • 345
  • 2
    Perl doesn't impose a limit, but your OS surely does. (1024, it seems. STDIN+STDOUT+STDERR+1021.) The limit may be configurable. By the way, you should be printing `$!`, not `$?`. – ikegami Aug 05 '11 at 20:03
  • 3
    http://perldoc.perl.org/FileCache.html FileCache is a standard module that is supposed to allow you to exceed the OS limit for open files. – d5e5 Aug 05 '11 at 20:10

2 Answers2

7

There is a limit to the number of open files per process in all programming languages.

This is actually a limit imposed by the operating system to prevent malicious (or bogus) programs to consume all the resources of the system, which could cause a freeze of the OS.

If you are using a Linux-based (non-Mac) OS, check out ulimit and /etc/security/limits.conf.

ulimit -n 2048

This should work on most Linux distros.

I don't know the configuration for Mac (it differs from Unix on this specific point) and/or Windows.


Edit:

The limit os OS X is defined using the launchctl tool:

launchctl limit maxfiles 2048 unlimited
Vivien Barousse
  • 20,555
  • 2
  • 63
  • 64
  • Before doing this sort of trickery, rethink your approach to see if you really need to have thousands of files open at the same time. – brian d foy Aug 05 '11 at 22:10
  • Opening and closing file handles takes time. If I can keep all the handles open simultaneously, then I don't need to write and debug code to manage a smaller pool of handles that I am constantly opening and closing to stay under the limit. – Alex Reynolds Aug 05 '11 at 22:27
6

There is an OS-imposed limit. Note that stdin/stdout/stderr all count as FDs. The default FD limit on Linux is 1024 per process. This question provides a bit more detail.

Note that the hard limit on most Linuxes I've used is 1024. Check /etc/security/limits.conf (path might depend on your distro) to see if you can increase it.

You might also consider rewriting the script so that it doesn't need all of these files open at once. Either load all the data in, or provide a lazy-loading mechanism so that you load data when you need it and then close the file.

Community
  • 1
  • 1
cdhowie
  • 158,093
  • 24
  • 286
  • 300
  • I'm working with files of a size larger than I can fit into memory. The idea is to stream through the file one line at a time and split it into a large set of much smaller files. If I keep a pool of handles open, I can quickly write to each one as I process each line of the larger file. – Alex Reynolds Aug 05 '11 at 20:52