How to handle CSV files with multiple comma separated values in the same column

Question

My CSV file format. Notice that the 6th row has multiple comma separated values for the same column.

a,b,c,d,e,f
g,h,i,j,k,l
m,n,o,p,q,r
s,t,u,v,w,x
y,z,ab,bc,cd,de
"a,b,c",d,e,f,g,"h,i,j"

Now, If I fire awk -F, '{print $1}', then for the 6th row, I get incorrect results.

$ awk -F, '{print $1}' test.csv
a
g
m
s
y
"a

How can I handle these multiple comma separated values for the same column with awk or with any other utility (sed, cut etc)?

Install and use ```csvtool```: ```csvtool col 1 test.csv```. There is already an existing answer for this — Jay jargot, May 02 '23 at 15:09

0stone0 · Answer 1 · 2023-05-02T15:07:59.630

0

You can use GNU-awk with a regex that captures between ""'s:

➜  /tmp cat test.csv
a,b,c,d,e,f
g,h,i,j,k,l
m,n,o,p,q,r
s,t,u,v,w,x
y,z,ab,bc,cd,de
"a,b,c",d,e,f,g,"h,i,j"
➜  /tmp
➜  /tmp
➜  /tmp awk 'BEGIN { FPAT = "([^,]*)|(\"[^\"]+\")"; } { print $1 }' test.csv
a
g
m
s
y
"a,b,c"
➜  /tmp

The FPAT value is a regex that describes the content of each field (col).

So in your CSV example, the 'content of each field' is any value that is (optionally) wrapped in quotes ("").

For more information, refer to the gawk manual:

https://www.gnu.org/software/gawk/manual/html_node/Splitting-By-Content.html

edited May 02 '23 at 15:07

answered May 02 '23 at 14:58

0stone0

34,288
4
39
64

I've added some explanation, for more info, please check the official manual with the provided link. – 0stone0 May 02 '23 at 15:08

How to handle CSV files with multiple comma separated values in the same column

1 Answers1