1

Starting Question

I have a CSV file which is formed this way (variable.csv)

E,F,G,H,I,J
a1,
,b2,b3
c1,,,c4,c5,c6

As you can see, the first and second columns do not have all the commas needed. Here's what I want:

E,F,G,H,I,J
a1,,,,,
,b2,b3,,,
c1,,,c4,c5,c6

With this, now every row has the right number of columns. In other words, I'm looking for a unix command which smartly appends the correct number of commas to the end of each row to make the row have the number of columns that we expect, based off the header.

Here's what I tried, based off of some searching: awk -F, -v OFS=, 'NF=6' variable.csv. This works in the above case, BUT...

Final Question

...Can we have this command work if the column data contains commas itself, or even new line characters? e.g.

E,F,G,H,I,J
"a1\n",
,b2,"b3,3"
c1,,,c4,c5,c6

to

E,F,G,H,I,J
"a1\n",,,,,
,b2,"b3,3",,,
c1,,,c4,c5,c6

(Apologies if this example's formatting is malformed due to the way the newline is represented.

MeanStackeD
  • 123
  • 2
  • 8
  • 1
    Your command seems to work for your example. `NF=6` is an assignment, returning true, and then prints with additional commas. Are there any rows it doesn't work? – thanasisp Jul 29 '20 at 01:01
  • @thanasisp yes, it seems to have been a machine-specific issue. In addition I realize the core of my question is missing a critical component, so I've updated the main question to add that. – MeanStackeD Jul 29 '20 at 03:31
  • 1
    `awk -v FPAT='([^,]*)|("[^"]+")' -v OFS=, 'NF=6' variable.csv` will work, ignoring commas inside quoted fileds. See: https://stackoverflow.com/questions/29642102/how-to-make-awk-ignore-the-field-delimiter-inside-double-quotes – thanasisp Jul 29 '20 at 03:54

1 Answers1

2

Short answer:

python3 -c 'import fileinput,sys,csv;b=list(csv.reader(fileinput.input()));w=max(len(i)for i in b);print("\n".join([",".join(i+[""]*(w-len(i)))for i in b]))' variable.csv

The python script may be long, but this is to ensure that all cases are handled. To break it down:

import fileinput,csv
b=list(csv.reader(fileinput.input()))                       # create a reader obj
w=max(len(i)for i in b)                                     # how many fields?
print("\n".join([",".join(i+[""]*(w-len(i)))for i in b]))   # output

BTW, in your starting problem

awk -F, -v OFS=, 'NF<6{$6=""}1' variable.csv

should work. (I think it's implementation or version related. Your code works on GNU awk but not on Mac version.)

dibery
  • 2,760
  • 4
  • 16
  • 25