3

I have an input CSV file containing something like:

SD-32MM-1001,"100.00",4/11/2012
SD-32MM-1001,"1,000.00",4/12/2012

I need to take out the formatting of numerical values for some other processing pipeline (postgresql COPY).

Is there a text filter that will separate out the columns on FS without peeking inside quoted strings? Presently I get:

$ tail +2 /tmp/foo.csv|awk -F, '{print NF}'
3
4

And similarly partial values for cut

I have to stay on Linux.

Thanks.

Dinesh
  • 4,437
  • 5
  • 40
  • 77
  • `print NF` prints the number of fields. If you want to print the last item, try `print $NF`. – TrueY Apr 19 '13 at 19:12
  • Same topic as [Can awk deal with CSV file that contains comma inside a quoted field?](http://stackoverflow.com/questions/3138363/can-awk-deal-with-csv-file-that-contain-comma-inside-a-quote), but it's helpful to index this asked in several different ways. – smci Jul 24 '14 at 01:24

3 Answers3

3

GNU awk can handle this you just need to set FPAT to describe what you consider a field:

$ awk '{print NF}' FPAT="([^,]+)|(\"[^\"]+\")" file
3
3

$ awk '{print $2}' FPAT="([^,]+)|(\"[^\"]+\")" file
"100.00"
"1,000.00"
Chris Seymour
  • 83,387
  • 30
  • 160
  • 202
  • thanks!!! I just changed FPAT to "([^,]*)|(\"[^\"]+\")" and now can also capture empty fields. – Dinesh Apr 19 '13 at 19:00
1

Using a script and a proper parser (the good solution: awk & cut are not suited for this particular needs):

use strict; use warnings;

use Text::CSV;

my @rows;
my $csv = Text::CSV->new ()
                or die "Cannot use CSV: ".Text::CSV->error_diag ();

open my $fh, "<:encoding(utf8)", "/tmp/file.csv" or die "$!";
while ( my $row = $csv->getline( $fh ) ) {

    # printing line 2, last field
    $. == 2 and print $row->[-1];
}
$csv->eof or $csv->error_diag();
close $fh;

Output

4/12/2012
Gilles Quénot
  • 173,512
  • 41
  • 224
  • 223
  • Thanks sputnick. I am doing this with python but only reluctantly: I would take an "-xyz" option over writing code :) – Dinesh Apr 19 '13 at 18:03
  • 1
    +1, not for perl specifically, but for the suggestion of using a language with a proper, road-tested CSV parsing library. – glenn jackman Apr 19 '13 at 21:46
0

The suggestions from sudo_O should work -- unless your fields have double quotes inside them, which can happen in standard CSV data, eg.

field1,field2,"field,3","field4 ""has some quotes"" in it",field5

To handle these, you can wrap the standard UNIX commands like cut,awk,etc with a program I wrote called csvquote like this:

csvquote /tmp/foo.csv | tail +2 | awk -F, '{print NF}'

This works by finding the commas inside quoted fields and replacing them temporarily with nonprinting characters that awk can safely handle. Then when you want to create output from the fields, the pipeline will need to restore those commas:

csvquote /tmp/foo.csv | cut -d, -f2 | csvquote -u

You can find the code here: https://github.com/dbro/csvquote

D Bro
  • 543
  • 6
  • 10