46

Please note - I am not looking for the "right" way to open/read a file, or the way I should open/read a file every single time. I am just interested to find out what way most people use, and maybe learn a few new methods at the same time :)*

A very common block of code in my Perl programs is opening a file and reading or writing to it. I have seen so many ways of doing this, and my style on performing this task has changed over the years a few times. I'm just wondering what the best (if there is a best way) method is to do this?

I used to open a file like this:

my $input_file = "/path/to/my/file";
open INPUT_FILE, "<$input_file"  || die "Can't open $input_file: $!\n";

But I think that has problems with error trapping.

Adding a parenthesis seems to fix the error trapping:

open (INPUT_FILE, "<$input_file")  || die "Can't open $input_file: $!\n";

I know you can also assign a filehandle to a variable, so instead of using "INPUT_FILE" like I did above, I could have used $input_filehandle - is that way better?

For reading a file, if it is small, is there anything wrong with globbing, like this?

my @array = <INPUT_FILE>;

or

my $file_contents = join( "\n", <INPUT_FILE> );

or should you always loop through, like this:

my @array;
while (<INPUT_FILE>) {
  push(@array, $_);
}

I know there are so many ways to accomplish things in perl, I'm just wondering if there are preferred/standard methods of opening and reading in a file?

Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
BrianH
  • 7,932
  • 10
  • 50
  • 71

12 Answers12

59

There are no universal standards, but there are reasons to prefer one or another. My preferred form is this:

open( my $input_fh, "<", $input_file ) || die "Can't open $input_file: $!";

The reasons are:

  • You report errors immediately. (Replace "die" with "warn" if that's what you want.)
  • Your filehandle is now reference-counted, so once you're not using it it will be automatically closed. If you use the global name INPUT_FILEHANDLE, then you have to close the file manually or it will stay open until the program exits.
  • The read-mode indicator "<" is separated from the $input_file, increasing readability.

The following is great if the file is small and you know you want all lines:

my @lines = <$input_fh>;

You can even do this, if you need to process all lines as a single string:

my $text = join('', <$input_fh>);

For long files you will want to iterate over lines with while, or use read.

rjh
  • 49,276
  • 4
  • 56
  • 63
JSBձոգչ
  • 40,684
  • 18
  • 101
  • 169
  • or slight variation... open my $input_fh, '<', $input_file or die "Can't open $input_file: $!"; – draegtun Nov 25 '08 at 22:34
  • 2
    I still think that this is boilerplate. Just use `File::Slurp` or `Tie::File`. – Svante Nov 29 '08 at 13:11
  • 5
    Also consider `use autodie;` which will make your IO operations fatal by default. Easier than writing 'or die' everywhere. – rjh Feb 22 '13 at 22:17
  • 1
    A few more reasons why this is good: 1) file handle is in lexical scope, not package (global), so you're less likely to have other code use it by accident 2) you can easily pass the file handle to subroutines without messing with typeglobs 3) separating the read mode indicator "<" is about more than just readability; it prevents nasty effects if a filename starts (for example) with a ">" character. – Lqueryvg Jul 30 '14 at 20:18
  • Is it fine to close it anyway when I'm done using it? What happens if I call `close($input_fh)`? – Agostino Jul 19 '21 at 17:00
15

If you want the entire file as a single string, there's no need to iterate through it.

use strict;
use warnings;
use Carp;
use English qw( -no_match_vars );
my $data = q{};
{
   local $RS = undef; # This makes it just read the whole thing,
   my $fh;
   croak "Can't open $input_file: $!\n" if not open $fh, '<', $input_file;
   $data = <$fh>;
   croak 'Some Error During Close :/ ' if not close $fh;
}

The above satisfies perlcritic --brutal, which is a good way to test for 'best practices' :). $input_file is still undefined here, but the rest is kosher.

Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
Kent Fredric
  • 56,416
  • 14
  • 107
  • 150
  • What does local $RS = undef; do? – Nathan Garabedian Nov 06 '12 at 20:52
  • 2
    `$RS` is the same as `$/` which `English` sets up for you. `$/` is the variable that tracks the `row seperator` value for `<$fh>`, which is synonymous with the notion of `get-line ` , or `$fh->getline()`. Essentially, it contains the value the internal read algorithm uses to know when it has read a full `line` of data, and setting this to `undef` means "there is no marker that indicates a full line" so it reads in the whole file as a "line" – Kent Fredric Nov 18 '12 at 08:15
14

Having to write 'or die' everywhere drives me nuts. My preferred way to open a file looks like this:

use autodie;

open(my $image_fh, '<', $filename);

While that's very little typing, there are a lot of important things to note which are going on:

  • We're using the autodie pragma, which means that all of Perl's built-ins will throw an exception if something goes wrong. It eliminates the need for writing or die ... in your code, it produces friendly, human-readable error messages, and has lexical scope. It's available from the CPAN.

  • We're using the three-argument version of open. It means that even if we have a funny filename containing characters such as <, > or |, Perl will still do the right thing. In my Perl Security tutorial at OSCON I showed a number of ways to get 2-argument open to misbehave. The notes for this tutorial are available for free download from Perl Training Australia.

  • We're using a scalar file handle. This means that we're not going to be coincidently closing someone else's file handle of the same name, which can happen if we use package file handles. It also means strict can spot typos, and that our file handle will be cleaned up automatically if it goes out of scope.

  • We're using a meaningful file handle. In this case it looks like we're going to write to an image.

  • The file handle ends with _fh. If we see us using it like a regular scalar, then we know that it's probably a mistake.

Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
pjf
  • 5,993
  • 25
  • 42
  • Great insight, thanks! I also had never seen the 3 argument method to "open" - I think I like that way! Thanks! – BrianH Nov 29 '08 at 16:09
11

If your files are small enough that reading the whole thing into memory is feasible, use File::Slurp. It reads and writes full files with a very simple API, plus it does all the error checking so you don't have to.

Dave Rolsky
  • 4,524
  • 1
  • 24
  • 24
  • 1
    File::Slurp is awesome, but it is a lot slower than Kent Fredric's direct read. (~4000 10-30k files in 7s direct vs 56s slurped per nytprof) – Bill Ruppert Dec 25 '11 at 15:25
6

There is no best way to open and read a file. It's the wrong question to ask. What's in the file? How much data do you need at any point? Do you need all of the data at once? What do you need to do with the data? You need to figure those out before you think about how you need to open and read the file.

Is anything that you are doing now causing you problems? If not, don't you have better problems to solve? :)

Most of your question is merely syntax, and that's all answered in the Perl documentation (especially (perlopentut). You might also like to pick up Learning Perl, which answers most of the problems you have in your question.

Good luck, :)

brian d foy
  • 129,424
  • 31
  • 207
  • 592
  • So maybe I shouldn't have asked what the best way to open/read a file is, but what do most people do. I've written hundreds of perl programs that open files, and just want to make sure I'm going about it in a good way. I haven't had any problems - I'm just curious how other people do it. Thanks! – BrianH Nov 25 '08 at 23:27
  • Again, read the first paragraph. The best way depends on what you are doing. – brian d foy Nov 29 '08 at 18:45
  • I'm not saying the Perl::Critic is law, but many of the ways to open files in "Learning Perl" do not pass Perl::Critic. In fact, the way I used to open files all the time is the way I learned it in "Learning Perl". I would argue that best practices can be applied to most situations where a file needs to be opened, and that you don't need to know the tiny details - otherwise I would ask, "What's the best way to open a binary file and count the bytes" or something like that. 99% of the files I open are plain text, and I just want to read it into an array. I'm interested to know best practice – BrianH Feb 04 '10 at 20:17
  • You must have an old Learning Perl then. – brian d foy Feb 04 '10 at 22:00
  • That could be - 3rd addition, 2002 - I'll have to look for a newer version. – BrianH Feb 09 '10 at 14:06
  • You don't have to look hard to find the Fifth edition, which is the current one. – brian d foy Feb 09 '10 at 20:14
  • Went out and bought the 5th addition - glad I did - there are some great updates! Critic doesn't appreciate bareword file handles, and I don't see examples in the 5th edition where non-bareword file handles are used. Again - not saying Critic is the law, but that was the point of my question: PBP/Critic says to use 3 argument opens, and non-bareword file handles. Learning Perl does not show examples of non-bareword file handles (that I can find). Not saying either way is right/wrong - but my question was to see how most people do it. – BrianH Mar 03 '10 at 14:40
5

For OO, I like:

use FileHandle;
...
my $handle = FileHandle->new( "< $file_to_read" );
croak( "Could not open '$file_to_read'" ) unless $handle;
...
my $line1 = <$handle>;
my $line2 = $handle->getline;
my @lines = $handle->getlines;
$handle->close;
Axeman
  • 29,660
  • 2
  • 47
  • 102
  • Yes, it will work with the "iteration operator", but you can also read it with $handle->getline or $handle->getlines – Axeman Nov 26 '08 at 18:10
5

It's true that there are as many best ways to open a file in Perl as there are

$files_in_the_known_universe * $perl_programmers

...but it's still interesting to see who usually does it which way. My preferred form of slurping (reading the whole file at once) is:

use strict;
use warnings;

use IO::File;

my $file = shift @ARGV or die "what file?";

my $fh = IO::File->new( $file, '<' ) or die "$file: $!";
my $data = do { local $/; <$fh> };
$fh->close();

# If you didn't just run out of memory, you have:
printf "%d characters (possibly bytes)\n", length($data);

And when going line-by-line:

my $fh = IO::File->new( $file, '<' ) or die "$file: $!";
while ( my $line = <$fh> ) {
    print "Better than cat: $line";
}
$fh->close();

Caveat lector of course: these are just the approaches I've committed to muscle memory for everyday work, and they may be radically unsuited to the problem you're trying to solve.

5

I once used the

open (FILEIN, "<", $inputfile) or die "...";
my @FileContents = <FILEIN>;
close FILEIN;

boilerplate regularly. Nowadays, I use File::Slurp for small files that I want to hold completely in memory, and Tie::File for big files that I want to scalably address and/or files that I want to change in place.

Svante
  • 50,694
  • 11
  • 78
  • 122
3

Read the entire file $file into variable $text with a single line

$text = do {local(@ARGV, $/) = $file ; <>};

or as a function

$text = load_file($file);
sub load_file {local(@ARGV, $/) = @_; <>}
2

If these programs are just for your productivity, whatever works! Build in as much error handling as you think you need.

Reading in a whole file if it's large may not be the best way long-term to do things, so you may want to process lines as they come in rather than load them up in an array.

One tip I got from one of the chapters in The Pragmatic Programmer (Hunt & Thomas) is that you might want to have the script save a backup of the file for you before it goes to work slicing and dicing.

John
  • 15,990
  • 10
  • 70
  • 110
2

The || operator has higher precedence, so it is evaluated first before sending the result to "open"... In the code you've mentioned, use the "or" operator instead, and you wouldn't have that problem.

open INPUT_FILE, "<$input_file"
  or die "Can't open $input_file: $!\n";
Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
Ape-inago
  • 1,870
  • 1
  • 13
  • 27
1

Damian Conway does it this way:

$data = readline!open(!((*{!$_},$/)=\$_)) for "filename";

But I don't recommend that to you.

ysth
  • 96,171
  • 6
  • 121
  • 214
  • it sets $/ to undef (slurp mode) and assigns \$_ to *{""}; assigning a reference to a glob just replaces the slot of the type of reference, so ${""} is an alias to $_ (which has the value "filename"). the ! negates the value of the assignment (1 since a list assignment in scalar context gives the number of elements on the right of the assingment) so is false. open treats the false value as "", so opens the *{""} filehandle, and one arg open gets the filename to open from the scalar of the glob. if open returns true, the readline also treats the false given by the ! as the *{""} filehandle – ysth Jan 08 '17 at 02:08