0

I recognise this might be a duplicate but the size of the file I have to split requires a method with doesn't load the csv into memory before processing it. ie I'm looking for a line by line method to read and split and output my file. I I only need my output to be the last 3 field without the quotes and without the thousand delimiting comma.

I have a file of arcGIS coordinates which contain quotes and commas internal to the fields. Data example below.

"0","0","1","1","1,058.83","1,455,503.936","5,173,996.331"

I have been trying to do this using variations on split( '","' , $line);. Here'e my code.

use strict;
use warnings;

open (FH, '<', "DEM_Export.csv") or die "Can't open file DEM_Export.csv";

open (FH2, '>', "DEM_ExportProcessed.csv") or die "Can't open file DEM_ExportProcessed.csv"; 
print FH2 "EASTING, NORTHING, ELEVATION,\n";
my $count = 0;
foreach my $line (<FH>) {
    chomp;
    # if ($count == 0){next;}

    print $line, "\n";
    my @list = split( '","' , $line);
    print "1st print $list[5],$list[6],$list[4]\n";
    $list[4] =~ s/,//g;
    $list[5] =~ s/,//g;
    $list[6] =~ s/,//g;
    $list[4] =~ s/"//g;
    $list[5] =~ s/"//g;
    $list[6] =~ s/"//g;
    print "2nd print $list[5],$list[6],$list[4]\n";
    if ($count == 10) { 
        exit;
    }      
    my $string = sprintf("%.3f,%.3f,%.3f\n", $list[5],$list[6],$list[4]); 
    print FH2 $string;
    $count++;
}

close FH;
close FH2;

I'm getting close my my wits end with this and really need a solution. Any help will be gratefully received. Cheers

MicrobicTiger
  • 577
  • 2
  • 5
  • 21

2 Answers2

4

This is really very straightforward using the Text::CSV to handle the nastiness of CSV data

Here's an example, which works fine with the sample data you have shown. As long as your input file is plain ASCII and the rows are about the size you have shown it should work fine

It prints its output to STDOUT, so you'll want to use a command-line redirect to put it into the file you want

use strict;
use warnings 'all';

use Text::CSV;

my $csv_file = 'DEM_Export.csv';

open my $in_fh, '<', $csv_file or die qq{Unable to open "$csv_file" for input: $!};

my $csv = Text::CSV->new({ eol => "\n" });

print "EASTING,NORTHING,ELEVATION\n";

while ( my $row = $csv->getline($in_fh) ) {

   $csv->print(\*STDOUT, [ map tr/,//dr, @$row[-2,-1,-3] ] );
}

output

1455503.936,5173996.331,1058.83
Borodin
  • 126,100
  • 9
  • 70
  • 144
  • Thanks Borodin, I wasn't brave enough to take on a new module straight up. In the end I have used this to solve the problem. – MicrobicTiger Mar 19 '16 at 23:12
  • 2
    @MicrobicTiger: That's understandable. If you're not certain what a module will do you may well spend hours getting into it only to find it's not what you needed. I posted this because I thought you may like to see how concise it could be. Your own code unnecessarily removes quotation marks from the data, which are already parsed out by the module on input. If your run time is longer than you would like then there's a [Text::CSV_XS](https://p3rl.org/Text::CSV_XS) that has the same API but is written partially in C so will run a little faster. – Borodin Mar 19 '16 at 23:17
  • 2
    @MicrobicTiger: I notice that you've also reordered your output fields from the final three in the original, so I've duplicated that in my own code. Please mark one of the solutions as accepted so that others can see that your problem has been resolved—your own is fine if that's the one that you prefer – Borodin Mar 19 '16 at 23:20
  • I wondered if I might have been pulling those quotes out unnecessarily, I have edited my code to reflect that. I would have chosen my own answer once I had posted it. It appears I have a 2 day lockout before I can do that. Appreciate your time teaching me! – MicrobicTiger Mar 20 '16 at 00:35
0

I guess I should have been braver and had a crack with Text::CSV to start with rather than asking a question. Many thanks to Сухой27 and choroba for pointing me in the right direction.

Here is the code I ended up with. Probably not the tidiest.

use strict;
use warnings;
use Text::CSV;

my $file  = "DEM_Export.csv";
my $file2 = "DEM_ExportProcessed.csv";

open (FH2, '>', $file2) or die "Can't open file $file2: $!";
print FH2 "EASTING, NORTHING, ELEVATION,\n";
print "Starting file processing...\n";
my $csv = Text::CSV->new ({ binary => 1, eol => $/ });
open my $io, "<", $file or die "$file: $!";
while (my $row = $csv->getline ($io)) {
    my @fields = @$row;
    s/,//g for @fields[3..5];     
    my $string = sprintf("%.3f,%.3f,%.3f\n", $fields[4],$fields[5],$fields[3]); 
    print FH2 $string;
}
print "Finished!";
close FH2;  

Worked a treat! Thank you.

MicrobicTiger
  • 577
  • 2
  • 5
  • 21