Attempting to manipulate a messy 4GB txt file with a mixture of csv formatting. Data has around 38 'columns' as defined by the delimiter '"'. (Example to follow) The data was exported using commas as a field separator, but also has commas inline with data which makes it difficult to import into most platforms. Ultimately I believe using awk/sed/cat I can fix the data. Each column of data can be defined by using the quotations marks. I just can't figure out how.
I want my end file to be columns identified by what is within the two sets of quotations with all commas replace by a period or similar. The section containing the commas is in the middle of my columns, and is not the last field within the data set. I was attempting to rip out the section with commas with awk, use sed to replace them, and then use cat to paste it back into the file.
Actual data is sensitive and cannot be shared, the below example is akin though.
Data Sample:
"identifier","Status","Name","City","Application","Job","Details","column 39"
"red","paid","Dave","Philadelphia","55823","Cashier","No commas in this comment","spare1"
"rojo","past due","Steve","San Francisco","78434","trainer","Does not like sushi, beer, or ham","spare2"
"verde","pending","Duncan","Columbus","65478","CEO","Late for work, on the fifth","spare3"
Desired Result is focused on changing the comma, and adding data back inline or at the end after "column 39"
"identifier","Status","Name","City","Application","Job","Details","column 39"
"red","paid","Dave","Philadelphia","55823","Cashier","No commas in this comment","spare1"
"rojo","past due","Steve","San Francisco","78434","trainer","Does not like sushi. beer. or ham","spare2"
"verde","pending","Duncan","Columbus","65478","CEO","Late for work. on the fifth","spare3"
Any suggestions are much appreciated!