I have a .csv with 470 columns and tens of thousands of rows of products, many with text strings including commas, that cause my awk statements to blow out and write to the wrong columns thus corrupting my data. Here are the statements I'm using:
Input example:
LDB0010-300,"TIMELESS DESIGN: Classic, Elegant, Beautiful",Yes,1,Live,...
LDB0010-400,"CLASSIC DESIGN: Contemporary look",No,0,Not Live,...
LDB0010-500,"Everyone should wear this, almost!",Yes,0,Not Live,...
Code:
cat products.csv | sed -e 's/, /@/g' | awk -F, 'NR>1 {$308="LIVE" ; $310="LIVE" ; $467=0 ; print $0}' OFS=, | sed -e 's/@/, /g'
Current output, which is wrong with data written in the wrong columns:
LDB0010-300,"TIMELESS DESIGN: Classic",LIVE, Beautiful",Yes,1,Live,...
LDB0010-400,"CLASSIC DESIGN: Contemporary look",No,0,0,...
LDB0010-500,"Everyone should wear this",LIVE,Yes,0,Not Live,...
When studying the data closer, I noticed that in the cells with text descriptions, commas were always followed with a space, whereas commas used as delimiters had no space after them. So the approach I took was to substitute comma-space with '@', run my awk statement to set the values of those columns, then substitute back from '@' to comma-space. This all looked pretty good until I opened the spreadsheet and noticed that there were many rows with values written into the wrong columns. Does anyone know a better way to do this that will prevent these blow outs?