3

I'm trying to read a file which has only CR as line delimiter. I'm using Mac OS X and Perl v.5.8.8. This script should run on every platform, for every kind of line delimiter (CR, LF, CRLF).

My current code is the following :

open(FILE, "test.txt");

while($record = <FILE>){
    print $record;
}

close(TEST);

This currently print only the last line (or worst). What is going on? Obvisously, I would like to not convert the file. Is it possible?

Zaid
  • 36,680
  • 16
  • 86
  • 155
subb
  • 1,578
  • 1
  • 15
  • 27
  • 4
    I'm going to be annoying and suggest that you use the `strict` and `warnings` pragmas, they will save you hours of debugging. Also, it is best to use the modern 3 argument form of open with lexical file handles. See http://stackoverflow.com/questions/1479741/why-is-three-argument-open-calls-with-lexical-filehandles-a-perl-best-practice for more info. – daotoad Jun 10 '10 at 22:32

2 Answers2

20

You can set the delimiter using the special variable $/:

local $/ = "\r" # CR, use "\r\n" for CRLF or "\n" for LF
my $line = <FILE>;

See perldoc perlvar for further information.

Another solution that works with all kinds of linebreaks would be to slurp the whole file at once and then split it into lines using a regex:

local $/ = undef;
my $content = <FILE>;
my @lines = split /\r\n|\n|\r/, $content;

You shouldn't do that with very large files though, as the file is read into memory completely. Note that setting $/ to the undefined value disables the line delimiter, meaning that everything is read until the end of the file.

jkramer
  • 15,440
  • 5
  • 47
  • 48
  • "\r" is just an example for CR, you might want to try "\r\n" and "\n" for CRLF and LF respectively. – jkramer Jun 10 '10 at 20:13
  • 1
    Oh I see. CR and Terminal don't play well together. – subb Jun 10 '10 at 20:21
  • Your split has a bug. Perl will use the first matching branch in an alternation and only try later branches if it can't satisfy the full pattern. So if `$content` is `"a\r\nb"` the result of lines will be `('a', '', 'b')`. Rearranging the alternation to `/\r\n|\r|\n/` will produce the desire results, it can be simplified further to `/\r\n?|\n/`. – Ven'Tatsu Jun 11 '10 at 15:00
  • You're right, I fixed it. I usually just use /\r?\n/, but that wouldn't work with CR linebreaks. However, I've never seen CR-only linebreaks being used in practice before. – jkramer Jun 11 '10 at 15:35
  • See [my answer here](https://stackoverflow.com/a/56703060/111036) for an alternative using `\R` to split on any of the 3 newline delimiters. (And yes, unfortunately, there are still Mac programs using CR 20 years after it has been deprecated) – mivk Oct 05 '20 at 19:31
1

I solved a more general problem that could be useful here:

How to parse big file line-by-line with any line delimiter (CR/CRLF/LF), but unknown beforehand.

'Big' file means that it is not ok to read the whole file into one variable. Here function 'detectEndOfLine' gets name of file and returns either '\r' or '\n', whatever is used for line ending (it searched for '\r' or '\n' symbol char-by-char starting from the end of the file).

my $file = "test.txt";
local $/ = detectEndOfLine($file);
open(IN, $file) or die "Can't open file \"$file\" for reading: $!\n";
while(<IN>) {
    s/\r\n|\n|\r$//;
    print "$_\n";
}

sub detectEndOfLine {
    my $file = $_[0];
    my $size = -s $file;
    print "\"$size\"\n";

    open(IN, $file) or die "Can't open file \"$file\" for reading: $!\n";
    for(my $i = $size; $i >= 0; --$i) {
        seek(IN, $i, 0);
        $_ = <IN>;
        my $sym = substr($_, 0, 1);
        return $sym if( $sym eq "\n" or $sym eq "\r" );
    }
    return undef;
}
dmitry
  • 11
  • 2