131

I'm trying to open an .html file as one big long string. This is what I've got:

open(FILE, 'index.html') or die "Can't read file 'filename' [$!]\n";  
$document = <FILE>; 
close (FILE);  
print $document;

which results in:

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN

However, I want the result to look like:

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">

This way I can search the entire document more easily.

galoget
  • 722
  • 9
  • 15
goddamnyouryan
  • 6,854
  • 15
  • 56
  • 105
  • 8
    Really should check what the definition of "Cant install" is, its a common problem and its commonly an argument that doesn't need to be made. http://stackoverflow.com/questions/755168/perl-myths/755179#755179-Im-Not-Allowed-to-install-modules – Kent Fredric Jun 05 '09 at 04:47
  • 1
    I'm actually unable to modify anything on the entire sever that this script is running on, apart from the script it self. – goddamnyouryan Jun 05 '09 at 16:34
  • So you aren't allowed to add any files, anywhere on the server? – Brad Gilbert Jul 31 '11 at 17:50
  • FatPack modules into your script? Also, it looks like you might be thinking of parsing HTML with regular expressions, don't. – MkV May 28 '13 at 20:09

18 Answers18

102

I would do it like this:

my $file = "index.html";
my $document = do {
    local $/ = undef;
    open my $fh, "<", $file
        or die "could not open $file: $!";
    <$fh>;
};

Note the use of the three-argument version of open. It is much safer than the old two- (or one-) argument versions. Also note the use of a lexical filehandle. Lexical filehandles are nicer than the old bareword variants, for many reasons. We are taking advantage of one of them here: they close when they go out of scope.

Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
Chas. Owens
  • 64,182
  • 22
  • 135
  • 226
  • 9
    This is probably the best non-cpan'd way to do it as it uses both the 3 argument open as well as keeping the INPUT_RECORD_SEPARATOR ($/) variable localized to the smallest required context. – Danny Jun 05 '09 at 17:13
  • No need to close `$fh`? – Ωmega Mar 31 '23 at 14:15
85

Add:

 local $/;

before reading from the file handle. See How can I read in an entire file all at once?, or

$ perldoc -q "entire file"

See Variables related to filehandles in perldoc perlvar and perldoc -f local.

Incidentally, if you can put your script on the server, you can have all the modules you want. See How do I keep my own module/library directory?.

In addition, Path::Class::File allows you to slurp and spew.

Path::Tiny gives even more convenience methods such as slurp, slurp_raw, slurp_utf8 as well as their spew counterparts.

Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
Sinan Ünür
  • 116,958
  • 15
  • 196
  • 339
  • 36
    You should probably explain what effects localizing $/ is going to do as well as what its purpose is. – Danny Jun 05 '09 at 17:14
  • 12
    If you're not going to explain anything about localizing `$/`, you should probably add links for further information. – Brad Gilbert Jul 31 '11 at 17:52
  • 7
    A good step by step explanation of what is doing: { local $/; <$fh> } is provided here : http://www.perlmonks.org/?node_id=287647 – dawez May 07 '12 at 07:48
  • Perhaps just say why you must use `local` and not `my`. – Geremia Mar 12 '16 at 19:47
  • @Geremia A discussion of scoping is beyond the scope of this answer. – Sinan Ünür Mar 14 '16 at 12:18
  • The first line from the Perl documentation for local: "You really probably want to be using my instead..." http://perldoc.perl.org/functions/local.html – HoldOffHunger Sep 30 '16 at 22:46
  • @HoldOffHunger So what? This is one of the places where you must use `local` ... Don't feel the need to comment or vote down an answer unless you are familiar with the programming language about which you are commenting. – Sinan Ünür Oct 01 '16 at 02:49
  • Do any of these things work easily from the command line? I was trying to remove some multi-line whitespace followed by a particular pattern like `perl -pe 's/\s+pattern//gs'` and getting frustrated until I realized `-p` means I'm only looking at one line at a time. Doh! – Sigfried Mar 20 '20 at 11:54
82

With File::Slurp:

use File::Slurp;
my $text = read_file('index.html');

Yes, even you can use CPAN.

Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
Quentin
  • 914,110
  • 126
  • 1,211
  • 1,335
  • The OP said he can't modify anything on the server. The "Yes, even you can use CPAN" link here shows you how to work around that limitation, in most cases. – Trenton Jul 29 '15 at 05:45
  • `Can't locate File/Slurp.pm in @INC (@INC contains: /usr/lib/perl5/5.8/msys` :( – Dmytro Aug 25 '16 at 20:53
  • 2
    @Dmitry — So install the module. There's an install instructions link on the metacpan page I linked to from this answer. – Quentin Aug 26 '16 at 15:17
60

All the posts are slightly non-idiomatic. The idiom is:

open my $fh, '<', $filename or die "error opening $filename: $!";
my $data = do { local $/; <$fh> };

Mostly, there is no need to set $/ to undef.

jrockway
  • 42,082
  • 9
  • 61
  • 86
  • 3
    `local $foo = undef` is just the Perl Best Practice (PBP) suggested method. If we are posting snippits of code I'd think doing our best to make it clear would be A Good Thing. – Danny Jun 05 '09 at 17:17
  • 3
    Showing people how to write non-idiomatic code is a good thing? If I saw "local $/ = undef" in code I was working on, my first action would be to publicly humiliate the author on irc. (And I am generally not picky about "style" issues.) – jrockway Jun 05 '09 at 18:20
  • 1
    Ok, I'll bite: what exactly is mock-worthy about "local $/ = undef"? If your only answer is "It's non-idiomatic," then (a) I'm not so sure and (b) so what? I'm not so sure, because it's awfully damn common as a way to do this. And so what because it's perfectly clear and reasonably brief. You may be more picky about style issues that you think. – Telemachus Jun 06 '09 at 15:03
  • 1
    The key is that the "local $/" is part of a well-known idiom. If you are writing some random code and write "local $Foo::Bar = undef;", that is fine. But in this very special case, you might as well speak the same language as everyone else, even if it's "less clear" (which I don't agree with; the behavior of "local" is well-defined in this respect). – jrockway Jun 08 '09 at 08:26
  • 12
    Sorry, disagree. It is much more common to be explicit when you want to change the actual behavior of a magic variable; it is a declaration of intent. Even the documentation uses 'local $/ = undef' (see http://perldoc.perl.org/perlsub.html#Temporary-Values-via-local()) – Leonardo Herrera Jun 19 '09 at 15:11
  • Adding to Leonardo Herrera's comment -- It is easier to see that you want the value to be undef.. when you set it to undef.. as opposed to you forgot to set the value when you localized the variable. Compilers in other languages today are checking for whether or not some code path didn't set a value before using a variable; obviously depending on the fact that a not initialized variable has the the value undef breaks that. – Gerard ONeill Nov 16 '14 at 08:17
  • I think `open my $fh` part should be moved inside the `do` block to make Perl close the file automatically immediately after reading it. As the whole file is read at once, there's no need to keep the file open once the contents are in variable `$data`. – Mikko Rantalainen Aug 26 '22 at 07:44
19

From perlfaq5: How can I read in an entire file all at once?:


You can use the File::Slurp module to do it in one step.

use File::Slurp;

$all_of_it = read_file($filename); # entire file in scalar
@all_lines = read_file($filename); # one line per element

The customary Perl approach for processing all the lines in a file is to do so one line at a time:

open (INPUT, $file)     || die "can't open $file: $!";
while (<INPUT>) {
    chomp;
    # do something with $_
    }
close(INPUT)            || die "can't close $file: $!";

This is tremendously more efficient than reading the entire file into memory as an array of lines and then processing it one element at a time, which is often--if not almost always--the wrong approach. Whenever you see someone do this:

@lines = <INPUT>;

you should think long and hard about why you need everything loaded at once. It's just not a scalable solution. You might also find it more fun to use the standard Tie::File module, or the DB_File module's $DB_RECNO bindings, which allow you to tie an array to a file so that accessing an element the array actually accesses the corresponding line in the file.

You can read the entire filehandle contents into a scalar.

{
local(*INPUT, $/);
open (INPUT, $file)     || die "can't open $file: $!";
$var = <INPUT>;
}

That temporarily undefs your record separator, and will automatically close the file at block exit. If the file is already open, just use this:

$var = do { local $/; <INPUT> };

For ordinary files you can also use the read function.

read( INPUT, $var, -s INPUT );

The third argument tests the byte size of the data on the INPUT filehandle and reads that many bytes into the buffer $var.

Starfish
  • 1,083
  • 10
  • 23
brian d foy
  • 129,424
  • 31
  • 207
  • 592
8

A simple way is:

while (<FILE>) { $document .= $_ }

Another way is to change the input record separator "$/". You can do it locally in a bare block to avoid changing the global record separator.

{
    open(F, "filename");
    local $/ = undef;
    $d = <F>;
}
Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
  • 1
    There is a significant number of problems with both of the examples you gave. The main problem is that they are written in ancient Perl, I would recommend reading [Modern Perl](http://onyxneon.com/books/modern_perl/) – Brad Gilbert Jul 31 '11 at 17:48
  • @Brad, the comment was made years ago, the point still stands however. better is `{local $/; open(my $f, '<', 'filename'); $d = <$f>;}` – Joel Berger Jul 31 '11 at 18:18
  • @Joel that is only slightly better. You didn't check the output of `open` or the implicitly called `close`. `my $d = do{ local $/; open(my $f, '<', 'filename') or die $!; my $tmp = <$f>; close $f or die $!; $tmp}`. (That still has the problem that it doesn't specify the input encoding.) – Brad Gilbert Jul 31 '11 at 18:48
  • [`use autodie`](http://p3rl.org/autodie), the major improvement I meant to show was the lexical filehandle and the 3 arg open. Is there some reason you are `do`ing this? why not just dump the file into a variable declared before the block? – Joel Berger Jul 31 '11 at 19:29
8

Either set $/ to undef (see jrockway's answer) or just concatenate all the file's lines:

$content = join('', <$fh>);

It's recommended to use scalars for filehandles on any Perl version that supports it.

Eugene Yarmash
  • 142,882
  • 41
  • 325
  • 378
kixx
  • 3,245
  • 22
  • 19
6

Use

 $/ = undef;

before $document = <FILE>;. $/ is the input record separator, which is a newline by default. By redefining it to undef, you are saying there is no field separator. This is called "slurp" mode.

Other solutions like undef $/ and local $/ (but not my $/) redeclare $/ and thus produce the same effect.

Geremia
  • 4,745
  • 37
  • 43
5

Another possible way:

open my $fh, '<', "filename";
read $fh, my $string, -s $fh;
close $fh;
Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
echo
  • 2,666
  • 1
  • 25
  • 17
3

You're only getting the first line from the diamond operator <FILE> because you're evaluating it in scalar context:

$document = <FILE>; 

In list/array context, the diamond operator will return all the lines of the file.

@lines = <FILE>;
print @lines;
Nathan
  • 3,842
  • 1
  • 26
  • 31
  • Oh, thanks, I hadn't heard "diamond operator" before and thought they both shared the same name. I will correct it above. – Nathan Feb 08 '10 at 20:36
2
open f, "test.txt"
$file = join '', <f>

<f> - returns an array of lines from our file (if $/ has the default value "\n") and then join '' will stick this array into.

Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
2

I would do it in the simplest way, so anyone can understand what happens, even if there are smarter ways:

my $text = "";
while (my $line = <FILE>) {
    $text .= $line;
}
SomethingSomething
  • 11,491
  • 17
  • 68
  • 126
  • All those string concatenations are going to be quite expensive. I'd avoid doing this. Why tear the data apart only to put it back together? – andru Jan 20 '17 at 13:10
2

This is more of a suggestion on how NOT to do it. I've just had a bad time finding a bug in a rather big Perl application. Most of the modules had its own configuration files. To read the configuration files as-a-whole, I found this single line of Perl somewhere on the Internet:

# Bad! Don't do that!
my $content = do{local(@ARGV,$/)=$filename;<>};

It reassigns the line separator as explained before. But it also reassigns the STDIN.

This had at least one side effect that cost me hours to find: It does not close the implicit file handle properly (since it does not call closeat all).

For example, doing that:

use strict;
use warnings;

my $filename = 'some-file.txt';

my $content = do{local(@ARGV,$/)=$filename;<>};
my $content2 = do{local(@ARGV,$/)=$filename;<>};
my $content3 = do{local(@ARGV,$/)=$filename;<>};

print "After reading a file 3 times redirecting to STDIN: $.\n";

open (FILE, "<", $filename) or die $!;

print "After opening a file using dedicated file handle: $.\n";

while (<FILE>) {
    print "read line: $.\n";
}

print "before close: $.\n";
close FILE;
print "after close: $.\n";

results in:

After reading a file 3 times redirecting to STDIN: 3
After opening a file using dedicated file handle: 3
read line: 1
read line: 2
(...)
read line: 46
before close: 46
after close: 0

The strange thing is, that the line counter $. is increased for every file by one. It's not reset, and it does not contain the number of lines. And it is not reset to zero when opening another file until at least one line is read. In my case, I was doing something like this:

while($. < $skipLines) {<FILE>};

Because of this problem, the condition was false because the line counter was not reset properly. I don't know if this is a bug or simply wrong code... Also calling close; oder close STDIN; does not help.

I replaced this unreadable code by using open, string concatenation and close. However, the solution posted by Brad Gilbert also works since it uses an explicit file handle instead.

The three lines at the beginning can be replaced by:

my $content = do{local $/; open(my $f1, '<', $filename) or die $!; my $tmp1 = <$f1>; close $f1 or die $!; $tmp1};
my $content2 = do{local $/; open(my $f2, '<', $filename) or die $!; my $tmp2 = <$f2>; close $f2 or die $!; $tmp2};
my $content3 = do{local $/; open(my $f3, '<', $filename) or die $!; my $tmp3 = <$f3>; close $f3 or die $!; $tmp3};

which properly closes the file handle.

Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
jaw
  • 932
  • 2
  • 10
  • 24
1

I don't know if it's good practice, but I used to use this:

($a=<F>);
zawy
  • 351
  • 2
  • 5
0

You could simply create a sub-routine:

#Get File Contents
sub gfc
{
    open FC, @_[0];
    join '', <FC>;
}
Sheldon Juncker
  • 587
  • 1
  • 5
  • 18
0
use Path::Tiny qw( path );
 
my $file = 'data.txt';
my $data = path($file)->slurp_utf8;

Slurp mode - reading a file in one step: https://perlmaven.com/slurp

user3439968
  • 3,418
  • 1
  • 18
  • 15
0

For a text-based file, without installing extra modules (only core ones, i.e. installed by default), you can try this:

use IO::File;
my $content = join '', IO::File->new($filename)->getlines;
Gea-Suan Lin
  • 598
  • 7
  • 14
0

One more approach:

sub configure_logger ( ) {
  my @configuration = DATA -> getlines;
  my $configuration = join( "\n", @configuration );
  Log::Log4perl -> init( \$configuration );
}

configure_logger();

my $logger = Log::Log4perl -> get_logger;

Here we read file handle into an array (with the getlines method), and then convert the array's value into a string (using join).

getlines is a Perl file handle built-in method coming from the autoloaded IO::Handle class that allows us to treat file handles as objects.

DATA is a special file handle in Perl, but can refer to any other as well.

Elvin
  • 166
  • 7