3

I'm trying to read a binary file with the following code:

open(F, "<$file") || die "Can't read $file: $!\n";
binmode(F);
$data = <F>;
close F;

open (D,">debug.txt");
binmode(D);
print D $data;
close D;

The input file is 16M; the debug.txt is only about 400k. When I look at debug.txt in emacs, the last two chars are ^A^C (SOH and ETX chars, according to notepad++) although that same pattern is present in the debug.txt. The next line in the file does have a ^O (SI) char, and I think that's the first occurrence of that particular character.

How can I read in this entire file?

chris
  • 36,094
  • 53
  • 157
  • 237

3 Answers3

5

If you really want to read the whole file at once, use slurp mode. Slurp mode can be turned on by setting $/ (which is the input record separator) to undef. This is best done in a separate block so you don't mess up $/ for other code.

my $data;
{
    open my $input_handle, '<', $file or die "Cannot open $file for reading: $!\n";
    binmode $input_handle;
    local $/;
    $data = <$input_handle>;
    close $input_handle;
}

open $output_handle, '>', 'debug.txt' or die "Cannot open debug.txt for writing: $!\n";
binmode $output_handle;
print {$output_handle} $data;
close $output_handle;

Use my $data for a lexical and our $data for a global variable.

daxim
  • 39,270
  • 4
  • 65
  • 132
MvanGeest
  • 9,536
  • 4
  • 41
  • 41
  • 1
    Edited in order to promote modern practices, see rationale at [Why is three-argument open calls with lexical filehandles a Perl best practice?](http://stackoverflow.com/questions/1479741/why-is-three-argument-open-calls-with-lexical-filehandles-a-perl-best-practice) and [What’s the best way to open and read a file in Perl?](http://stackoverflow.com/questions/318789/whats-the-best-way-to-open-and-read-a-file-in-perl). – daxim Aug 17 '10 at 13:53
  • @daxim - I wanted to suggest that check, but I felt it was the OP's own responsibility... :) – MvanGeest Aug 17 '10 at 13:53
  • 1
    We can't teach without leading with good role models and eradicating outdated code. :) – daxim Aug 17 '10 at 14:06
  • I sense I've just been called outdated :) In any case, the solution here was to undef $/. (This solution still fails to write out the complete file to debug.txt, but this the goal was to get all my data into $data, it's good enough for me. Thanks. – chris Aug 17 '10 at 14:21
3

TIMTOWTDI.

File::Slurp is the shortest way to express what you want to achieve. It also has built-in error checking.

use File::Slurp qw(read_file write_file);
my $data = read_file($file, binmode => ':raw');
write_file('debug.txt', {binmode => ':raw'}, $data);

The IO::File API solves the global variable $/ problem in a more elegant fashion.

use IO::File qw();
my $data;
{
    my $input_handle = IO::File->new($file, 'r') or die "could not open $file for reading: $!";
    $input_handle->binmode;
    $input_handle->input_record_separator(undef);
    $data = $input_handle->getline;
}
{
    my $output_handle = IO::File->new('debug.txt', 'w') or die "could not open debug.txt for writing: $!";
    $output_handle->binmode;
    $output_handle->print($data);
}
daxim
  • 39,270
  • 4
  • 65
  • 132
  • Not so much concerned with elegance - this is a quick & dirty solution. But thanks for the education. – chris Aug 17 '10 at 14:24
  • In the second example why are you localising the code in blocks? – jmcnamara Sep 17 '10 at 12:05
  • When a handle variable goes out of scope, the attached file descriptor it is closed automatically. A naked block is the most straightforward way to create such a scope. – daxim Sep 17 '10 at 12:33
0

I don't think this is about using slurp mode or not, but about correctly handling binary files.

instead of

$data = <F>;

you should do

read(F, $buffer, 1024);

This will only read 1024 bytes, so you have to increase the buffer or read the whole file part by part using a loop.

golimar
  • 2,419
  • 1
  • 22
  • 33