0

I am trying to split several hundred lines read from a CSV file on ,. E.g.:

"Acme services","Sesame street","zip","0,56","2013-10-21"  
"Black adder, intra-national Association","shaftsville rd","zap code","0,50","2014-10-14"  

etc.

I could split the first row on ,, but this would not work for the second row. However, if I split on , then I would trap these cases. I could then remove the " using simple regex (e.g. $col[i] =~ s/\"+//g)

I have tried @cols = split(/\",\"/,$line), and I've tried split('","',$lines) and various variations, but every time, I get the full $line in $col[0], with $cols[1:n] as empty.

Any help would be much appreciated! Thanks.

mpapec
  • 50,217
  • 8
  • 67
  • 127
Carl
  • 598
  • 2
  • 11
  • 25

1 Answers1

9

Why not use Text::CSV. This will take care of edge cases where you have commas in values and all sorts of other problems,

from the cpan page

use Text::CSV;

my @rows;
my $csv = Text::CSV->new ( { binary => 1 } )  # should set binary attribute.
                or die "Cannot use CSV: ".Text::CSV->error_diag ();

open my $fh, "<:encoding(utf8)", "test.csv" or die "test.csv: $!";
while ( my $row = $csv->getline( $fh ) ) {
    $row->[2] =~ m/pattern/ or next; # 3rd field should match
    push @rows, $row;
}
$csv->eof or $csv->error_diag();
close $fh;

$csv->eol ("\r\n");

open $fh, ">:encoding(utf8)", "new.csv" or die "new.csv: $!";
$csv->print ($fh, $_) for @rows;
close $fh or die "new.csv: $!";

EDIT worked example assuming two given lines are in a.txt

use strict;
use Text::CSV;

my @rows;

my $csv = Text::CSV->new ( { binary => 1 } )  # should set binary attribute.
                or die "Cannot use CSV: ".Text::CSV->error_diag ();

open my $fh, "<:encoding(utf8)", "a.txt" or die "a.txt: $!";
while ( my $row = $csv->getline( $fh ) ) {

    foreach(@$row){
        print "$_\n";
    }
    print "\n";
}
$csv->eof or $csv->error_diag();
close $fh;

gives

Acme services
Sesame street
zip
0,56
2013-10-21

Black adder, intra-national Association
shaftsville rd
zap code
0,50
2014-10-14
KeepCalmAndCarryOn
  • 8,817
  • 2
  • 32
  • 47
  • `split /,/` is appropriate unless your data fields are quoted, as here. But the module is preferable for anything more complex. Note that `Text::CSV` will use `Text::CSV_XS` (the faster, C-based module) if it is installed. Otherwise it will use the similar pure-Perl version `Text::CSV_PP`. For most purposes either one is fine. – Borodin Oct 16 '14 at 01:09
  • @ KeepCalmAndCarryOn Thanks. I have implemented Text::CSV successfully in my code. However, it is simply splitting on ',' as before. I have tried changing sep to "," i.e. my $csv = Text::CSV_XS->new ({ binary => 1, sep_char => "\",\"" }); but have been unsuccessful in implementing this. Could you possibly spoon-feed me on how to do this? I didn't find the documentation terribly clear, perhaps because my Perl is fairly primitive. Thanks! – Carl Oct 16 '14 at 02:20
  • updated answer. What exactly are you looking to do? – KeepCalmAndCarryOn Oct 16 '14 at 02:47
  • Thanks, I've got it working using your foreach(@$row){. I had been having difficulty with using $row->[i], although its possible that the bug was elsewhere in my code. – Carl Oct 17 '14 at 20:06