I have has a csv in the format name,id,logindate
where logindates appear as "July 15, YYYY HH:mm:ss"
ie abc,123,"July 15, YYYY HH:mm:ss"
. Please note that there are headers and other information that should be skipped in the first 5 lines. So a sample csv file may look like:
AuditReport
asdf
qwerty
asdf
name, id, logindate
experiment,182002, "July 31, 2022 20:00:00"
unit 1998,183065, "July 3, 2022 21:00:00"
asdf, 202065, "May 25, 2022 20:00:00"
For my output, I would like to get the following (headers are removed):
experiment,182002, "July 31 2022 20:00:00"
unit 1998,183065, "July 3 2022 21:00:00"
asdf, 202065, "May 25 2022 20:00:00"
My main task is to be able to parse commas properly even with one being included in the string
After much google searching and going through several SO questions, I come to the conclusion that using a csv parser, some other language, or even GNU awk (using FPAT) is a better tool for this, but I m told that the production server in the company my dad works in uses awk and is assumed to not be gawk. (i m doing random small odd tasks to prepare myself for finding a job)
I m trying to workaround this by removing the "" and parsing by FS="," then concatenating the last two columns together again. However, my output keeps giving me 4 columns (unable to concatenate last two columns together into one column)
my code is:
/usr/bin/env awk {BEGIN{FS=","} NR>5 {print}' sample.csv | awk '{ gsub("\"", "") } { $1=$1 } 1' | awk '{ print $1, $2, $3" "$4 }' > test.csv
I also tried the following:
https://stackoverflow.com/a/48386788/16034206
awk '{$2=$2"-"$3;$3=""} 1' Input_file
In my case:
/usr/bin/env awk {BEGIN{FS=","} NR>5 {print}' sample.csv | awk '{ gsub("\"", "") } { $1=$1 } 1' | awk '{ $3=$3" "$4, $4=""} 1' > test.csv