1

I have a CSV file that I am trying to separate by field within a Unix environment. The CSV file is separated by commas, and some fields have values that use multiple commas like so:

NAME,DATE,SIZE,COLORS,TICKET

James, 11/20/10, XL, "YELLOW, BLUE, RED", true

Amy, 11/25/10, S, "BLUE, GREEN", false

Jack, 12/01/10, M, GREEN, true

Running a cut command with a comma delimiter doesn't properly split the fields because it counts the commas used inside some of the COLORS values. The same issue happens when using the awk command and applying -F or -FS:

awk  '{print $4,$5}' FS="," file.csv
awk -F"," '{print $4,$5}' file.csv
cut -d ',' -f4,5 file.csv

All of the code used above incorrectly split the fields and output something like this:

COLORS TICKET

"YELLOW BLUE

"BLUE GREEN

GREEN true

Is there a way to apply a delimiter that splits the fields of the CSV properly and that would account for exception values where it's wrapped by double quotes?

Community
  • 1
  • 1

1 Answers1

1

If you have gawk you can try FPAT variable to define fields instead of specifying field delimiter. However, in your particular case it might be easier just to filter out the first 3 fields

$ awk '{sub(/([^,]+,){3} ?/,"")}1' file

COLORS,TICKET
"YELLOW, BLUE, RED", true
"BLUE, GREEN", false
GREEN, true

or similarly with sed

$ sed -E 's/([^,]+,){3} ?//'
karakfa
  • 66,216
  • 7
  • 41
  • 56
  • Thank you! For using awk or sed, is there a way to filter out selected fields? Such as filtering out the 1st, 2nd, and 4th but keeping the 3rd and 5th. – mistermoose Mar 06 '18 at 04:39
  • `awk '{print $3,$NF}' file` should do in this case as well. – karakfa Mar 06 '18 at 13:08