19

i have to use awk to print out 4 different columns in a csv file. The problem is the strings are in a $x,xxx.xx format. When I run the regular awk command.

awk -F, {print $1} testfile.csv 

my output `ends up looking like

307.00
$132.34
30.23

What am I doing wrong.

"$141,818.88","$52,831,578.53","$52,788,069.53" this is roughly the input. The file I have to parse is 90,000 rows and about 40 columns This is how the input is laid out or at least the parts of it that I have to deal with. Sorry if I made you think this wasn't what I was talking about.

If the input is "$307.00","$132.34","$30.23" I want the output to be in a

$307.00
$132.34
$30.23
SiegeX
  • 135,741
  • 24
  • 144
  • 154
Dudusmaximus
  • 215
  • 1
  • 3
  • 6
  • Provide a sample input and I'll see what you can do about the output. – JUST MY correct OPINION Dec 04 '10 at 01:43
  • OK, giving sample input that doesn't even come close to resembling the actual input is worthless. Give. Me. **REPRESENTATIVE** Sample. Input. – JUST MY correct OPINION Dec 04 '10 at 02:30
  • Possible duplicate of [Parse a csv using awk and ignoring commas inside a field ](http://stackoverflow.com/questions/4205431/parse-a-csv-using-awk-and-ignoring-commas-inside-a-field). There's a link in an answer to that question which goes to an AWK script that handles CSV files. In general, though, it's better to use a tool specifically designed for CSV files or a module for Python or Perl. – Dennis Williamson Dec 04 '10 at 02:54
  • I wish I could use something else. But i'm required to use awk to parse it. – Dudusmaximus Dec 04 '10 at 03:00
  • 1
    Please post an input example and the desired PAIRED output – Dr. belisarius Dec 04 '10 at 03:05
  • @Dudusmaximus: I too faced this problem some time ago and was able to solve it quite elegantly with just a bit of field separator magic. [See my answer](http://stackoverflow.com/questions/4351434/parse-a-csv-file-that-contains-commans-in-the-fields-with-awk/4352084#4352084) – SiegeX Dec 04 '10 at 05:35

4 Answers4

20

Oddly enough I had to tackle this problem some time ago and I kept the code around to do it. You almost had it, but you need to get a bit tricky with your field separator(s).

awk -F'","|^"|"$' '{print $2}' testfile.csv 

Input

# cat testfile.csv
"$141,818.88","$52,831,578.53","$52,788,069.53"
"$2,558.20","$482,619.11","$9,687,142.69"
"$786.48","$8,568,159.41","$159,180,818.00"

Output

# awk -F'","|^"|"$' '{print $2}' testfile.csv
$141,818.88
$2,558.20
$786.48

You'll note that the "first" field is actually $2 because of the field separator ^". Small price to pay for a short 1-liner if you ask me.

SiegeX
  • 135,741
  • 24
  • 144
  • 154
  • 2
    Very slick! Building on this method, here's a way to dispose of that pesky empty first field so the field numbers start with `$1` as usual: `awk -F'","|^"|"$' '{sub("^\"","")} {print $1}'` – Kamal Dec 04 '10 at 19:15
  • 3
    Will this work when not every field uses quotes? eg. for `ANAD,2.69,183.38,446.31,2.90,41.46,"Technology","Semiconductor - Integrated Circuits",,2.34,40.10%,-51.88%,33.17%,-16.46%,"Anadigics, Inc.",3.18%,"USA",,` So I am trying to grab only the `"Anadigics, Inc."` in position `$15`, when `$1=="ANAD"` – Marcos Feb 25 '12 at 13:21
  • @Marcos no, sorry it won't. However, all you need to use is a comma as a field separator, so `-F','` – SiegeX Feb 26 '12 at 23:48
  • 1
    Only `"Anadigics` returns when I `stock="ANAD"; awk -F',' '$1=="$stock" {print $15}' AllStocks.csv` but thanks – Marcos Feb 27 '12 at 09:01
  • @Marcos that's because that's not how you pass variables to `awk`. The `$stock` will never expand because the entire awk command is inside single quotes. you need to do `stock="ANAD"; awk -F',' '$1==stock{print $15}' stock="$stock" AllStocks.csv` – SiegeX Feb 27 '12 at 17:43
  • @SiegeX Good catch. Actually that was just a typo here only. All that & more happens inside my script. What I should have reduced it to for testing here was: `stock="ANAD"; awk -F',' '$1=="'$stock'" {print $15}' AllStocks.csv` run from a `bash` shell – Marcos Feb 27 '12 at 18:08
  • I m using @Kamal 's answer with `-F'","'` However, it does not have any of the issues mentioned by both you and Kamal. ie empty first field, stray double quotes etc. Can you explain how the solution works in your answer? my csv is something like `fieldA,fieldB,"fieldC1 , fieldC2"` – Tan Yu Hau Sean Jul 31 '22 at 05:18
8

I think what you're saying is that you want to split the input into CSV fields while not getting tripped up by the commas inside the double quotes. If so...

First, use "," as the field separator, like this:

awk -F'","' '{print $1}'

But then you'll still end up with a stray double-quote at the beginning of $1 (and at the end of the last field). Handle that by stripping quotes out with gsub, like this:

awk -F'","' '{x=$1; gsub("\"","",x); print x}'

Result:

echo '"abc,def","ghi,xyz"' | awk -F'","' '{x=$1; gsub("\"","",x); print x}'

abc,def
Kamal
  • 7,160
  • 2
  • 21
  • 12
  • omg thank you that worked perfectly. i've been stuck on this for the past 2 days – Dudusmaximus Dec 04 '10 at 03:24
  • Great! Please be sure to click the green check mark indicating that this solved the problem for you. – Kamal Dec 04 '10 at 03:41
  • 2
    You can do this without the need for gsub() and thus additional variables. The key is to use multiple field separators with `-F'","|^"|"$'` (see my answer). – SiegeX Dec 04 '10 at 05:33
3

In order to let awk handle quoted fields that contain the field separator, you can use a small script I wrote called csvquote. It temporarily replaces the offending commas with nonprinting characters, and then you restore them at the end of your pipeline. Like this:

csvquote testfile.csv | awk -F, {print $1} | csvquote -u

This would also work with any other UNIX text processing program like cut:

csvquote testfile.csv | cut -d, -f1 | csvquote -u

You can get the csvquote code here: https://github.com/dbro/csvquote

D Bro
  • 543
  • 6
  • 10
  • Glad I found this great utility! I finally found a reliable way to parse mysqldump output on servers which do not have Select into outfile permissions. – Michael Blood Apr 19 '17 at 06:09
1

The data file:

$ cat data.txt
"$307.00","$132.34","$30.23"

The AWK script:

$ cat csv.awk
BEGIN { RS = "," }
{ gsub("\"", "", $1);
  print $1 }

The execution:

$ awk -f csv.awk data.txt
$307.00
$132.34
$30.23
JUST MY correct OPINION
  • 35,674
  • 17
  • 77
  • 99
  • The OP wasn't very clear in his question but his problem happens when the fields themselves have commas in them. See my answer for a work around to this. – SiegeX Dec 04 '10 at 05:27
  • I took his input and generated his desired output. If he wanted something else, he should have asked for that. ;) – JUST MY correct OPINION Dec 04 '10 at 07:06