How to use awk to filter a .csv file

Question

I was wondering how can I get the names of the fruits in from this .csv file by using awk or some other cli tool.

I used a macro in vim to edit the file, but I would think that there is an easy one liner that would do the same.

"1000","Apple","4","133"
"1028","Lemon","3","120"
"1029","Lime","3","165"
"1030","Lychee","6","120"
"1031","Mango","6","131"
"1032","Mangostine","1","181"
"1033","Melon","4","159"
"1034","Cantaloupe","4","138"
"1035","Honeydew melon","4","155"
"1036","Watermelon","5","176"
"1037","Rock melon","2","180"
"1038","Nectarine","1","128"
"1039","Orange","6","142"
"1040","Peach","6","179"
"1041","Pear","3","102"
"1042","Williams pear or Bartlett pear","1","164"
"1043","Pitaya","2","170"
"1044","Physalis","5","166"
"1045","Plum/prune (dried plum)","4","103"
"1046","Pineapple","3","120"
"1047","Pomegranate","5","112"
"1048","Raisin","4","111"
"1049","Raspberry","5","156"
"1050","Western raspberry (blackcap)","6","173"

The final result that I would want would look like this:

Apple
Lemon
Lime
Lychee
Mango
Mangostine
Melon
Cantaloupe
Honeydew melon
Watermelon
Rock melon
Nectarine
Orange
Peach
Pear
Williams pear or Bartlett pear
Pitaya
Physalis
Plum/prune (dried plum)
Pineapple
Pomegranate
Raisin
Raspberry
Western raspberry (blackcap)

I realize that this is a duplicate:

What's the most robust way to efficiently parse CSV using awk?

How to parse a CSV in a Bash script?

Using the presented duplicate, you quicly come too the answer : `awk -v FPAT='[^,]*|"[^"]+"' '{print $2}' file.csv` — kvantour, Feb 03 '21 at 21:30
Here are some other methods too: Method 1: `grep -o "[a-zA-Z() ]*" fruits.csv` to mach all desired characters, Method 2: `cut -d"," -f2 fruits.csv | sed 's/"//g'` define a delimeter `-d","`, choose an index `-f2`, remove quotes with sed. Method 3: `sed 's/[^,]*,//;s/,.*//;s/"//g' fruits.csv` remove everything up through the first comma, remove everything after and including the second comma, remove quotes — Barak Binyamin, Feb 04 '21 at 15:12

Cyrus · Accepted Answer · 2021-02-04T05:59:03.103

3

I suggest:

awk -F '","' '{print $2}' file

Use "," as field separator and output second column.

edited Feb 04 '21 at 05:59

answered Feb 03 '21 at 21:07

Cyrus

84,225
14
89
153

Ravi Saroch · Answer 2 · 2021-02-04T09:56:43.797

1

Using combination of sed and awk

sed -e 's/^"//;s/","/\t/g;s/"//g' Input.csv| awk -F'\t' '{print$2}'

or

awk -F, '{print$2}' Input.csv | sed 's/"//g'

Both can print each column via changing the awk column number.

edited Feb 04 '21 at 09:56

answered Feb 04 '21 at 09:43

Ravi Saroch

934
2
13
28

score 0 · Answer 3 · answered Feb 03 '21 at 21:08

Use this Perl one-liner:

perl -F',' -lane '$F[1] =~ tr/"//d; print $F[1];' in_file > out_file

The Perl one-liner uses these command line flags:
-e : Tells Perl to look for code in-line, instead of in a file.
-n : Loop over the input one line at a time, assigning it to $_ by default.
-l : Strip the input line separator ("\n" on *NIX by default) before executing the code in-line, and append it when printing.
-a : Split $_ into array @F on whitespace or on the regex specified in -F option.
-F',' : Split into @F on comma, rather than on whitespace.

SEE ALSO:
perldoc perlrun: how to execute the Perl interpreter: command line switches

score 0 · Answer 4 · answered Feb 03 '21 at 21:13

0

GNU awk and gensub():

$ gawk '{print gensub(/^[^,]*,"|([^,])".*/,"\\1","g")}' file

Output

Apple
...
Lemon
Lime

answered Feb 03 '21 at 21:13

James Brown

36,089
7
43
59

Carlos Pascual · Answer 5 · 2021-02-04T09:05:49.723

With awk removing all " only in the second field and only at the begining and at the end of the second field.


awk -F',' '{gsub(/^"|"$/,"",$2);print $2}' file
Apple
Lemon
Lime
Lychee
Mango
Mangostine
Melon
Cantaloupe
Honeydew melon
Watermelon
Rock melon
Nectarine
Orange
Peach
Pear
Williams pear or Bartlett pear
Pitaya
Physalis
Plum/prune (dried plum)
Pineapple
Pomegranate
Raisin
Raspberry
Western raspberry (blackcap)

How to use awk to filter a .csv file

5 Answers5