0

I am trying to parse a csv file, where input is enclosed in '"' and separated by comma ',' with the below code:

split($0,data,",")
print "\""data[1]"\",\""data[7]"\",\""data[2]

It should take columns separately, perform operations if needed, so don't advise to print as is ;) So problem is the last column - its grabbed with '\n' symbol, so the next column overwrites my current line, initial file:

"00:00:00","87100","2381","",""," ","13"
"00:00:01","56270","0098","",""," ","37"
"00:00:01","86917","0942","",""," ","12"

so instead of this:

 "00:00:00","13","87100"
 "00:00:01","37","56270"
 "00:00:01","12","86917"

I'm getting this:

","87100"
","87100"
","87100"

("data[1]","data[3) is being overwritten. I have removed last column from print list, and it worked fine. And also, I can't add commas after the last column, that is too much. Any other advises on code?

jas
  • 10,715
  • 2
  • 30
  • 41
amelie
  • 289
  • 3
  • 16
  • 2
    it would help if you add 3-5 lines of input and show complete expected output for that particular sample - adds clarity as well as makes it easy to test solutions before answering.. – Sundeep Oct 23 '18 at 12:52
  • 1
    @Sundeep edited, take a look please – amelie Oct 23 '18 at 12:57
  • What do you mean by 'grabbed with \n'? – oguz ismail Oct 23 '18 at 13:05
  • 2
    See https://ideone.com/NExIAJ, it seems you may use `awk -F',' '{print $1 "," $7 "," $2}' file > outfile` – Wiktor Stribiżew Oct 23 '18 at 13:07
  • 3
    I believe you are suffering from a nasty case of `CRLF` in your input file. Have a look at [Why does my tool output overwrite itself and how do I fix it?](https://stackoverflow.com/questions/45772525/why-does-my-tool-output-overwrite-itself-and-how-do-i-fix-it) – kvantour Oct 23 '18 at 13:20
  • 1
    CSV files are only [loosely standardized](https://tools.ietf.org/html/rfc4180) and I'm not sure `awk` is the tool that can actually decode them without a whole ton of dark hackery. Why not use a proper CSV parser? – tadman Oct 23 '18 at 15:27

1 Answers1

1

Rather than splitting each line, you should specify the field-separator as ',' (using -F). Then it's much simpler to print each field (still quote enclosed). You can still access the entire line as $0.

awk -F',' '{print $1","$7","$2}' csv_file
flu
  • 546
  • 4
  • 11