How to remove new line characters until each line has a specific number of instances of a specific character?

Question

I have a real mess of a pipe-delimited file, which I need to load to a database. The file has 35 fields, and thus 34 pipes. One of the fields is comprised of HTML code which, for some records, includes multiple line breaks. Unfortunately there's no patter as to where the line breaks fall.

The solution I've come up with is to count the number of pipes in each line and until that number reaches 34, remove the new line character from that line. I'm not incredibly well-versed in Perl, but I think I'm close to achieving what I'm looking to do. Any suggestions?

#!/usr/local/bin/perl

use strict;

open (FILE, 'test.txt');

while (<FILE>) {
    chomp;
    my $line = $_;
    #remove null characters that are included in file
    $line =~ tr/\x00//;
    #count number of pipes
    my $count = ($line =~ tr/|//);
    #each line should have 34 pipes
    while ($count < 34) {
        #remove new lines until line has 34 pipes
        $line =~ tr/\r\n//;
        $count = ($line =~ tr/|//);
        print "$line\n";
    }
}

See also: http://stackoverflow.com/questions/6075327/how-do-i-handle-store-multiple-lines-into-a-single-field-read-from-a-file-in-perl — mob, May 20 '11 at 18:59

mob · Answer 1 · 2011-01-21T21:45:58.923

1

Twiddle with $/, the input record separator?

while (!eof(FILE)) {

    # assemble a row of data: 35 pipe separated fields, possibly over many lines
    my @fields = ();
    {
        # read 34 fields from FILE:
        local $/ = '|';
        for (1..34) {
            push @fields, scalar <FILE>;
        }
    }   # $/ is set back to original value ("\n") at the end of this block

    push @fields, scalar <FILE>;  # read last field, which ends with newline
    my $line = join '|', @fields;
    ... now you can process $line, and you already have the @fields ......
}

edited Jan 21 '11 at 21:45

answered Jan 21 '11 at 20:07

mob

117,087
18
149
283

Wow, interesting method that I never would have thought of. – user584982 Jan 21 '11 at 21:44

score 1 · Accepted Answer · answered Jan 21 '11 at 20:21

This should work I guess.

#!/usr/bin/perl

use strict;

open (FILE, 'test.txt');

my $num_pipes = 0, my $line_num = 0;
my $tmp = "";
while (<FILE>)
{
    $line_num++;
    chomp;
    my $line = $_;
    $line =~ tr/\x00//; #remove null characters that are included in file
    $num_pipes += ($line =~ tr/|//); #count number of pipes
    if ($num_pipes == 34 && length($tmp))
    {
            $tmp .= $line;
            print "$tmp\n";
            # Reset values.
            $tmp = "";
            $num_pipes = 0;
    }
    elsif ($num_pipes == 34 && length($tmp) == 0)
    {
            print "$line\n";
            $num_pipes = 0;
    }
    elsif ($num_pipes < 34)
    {
            $tmp .= $line;
    }
    elsif ($num_pipes > 34)
    {
            print STDERR "Error before line $line_num. Too many pipes ($num_pipes)\n";
            $num_pipes = 0;
            $tmp = "";
    }
}

How to remove new line characters until each line has a specific number of instances of a specific character?

2 Answers2

Linked