My regex didn't work in a csv file with awk on its command line field separator
.
My csv is separated by commas (,
) but some fields has commas inside itself too.
The data.csv
is like:
t1,t2,t3,t4
field without comma,f02,f03,f04
field, with comma,f12,f13,f14
field without comma,f22,f23,f24
field without comma,f22,f23,f34
If we see in field, with comma,f12,f13,f14
, we have two kinds of commas:
- comma is part of the data (inside in the field), like
field, with comma
, and; - comma is separating fields
,f12,f13,f14
.
So I tried awk, with -F
and regex:
awk -F'/\B\,/\B/' '!seen[$2]++' data.csv > resulted.csv
My strategy was: the field separator
needs to be a comma \,
in No-Word-Boundary \B
.
So, my command didn't outputted the resulted.csv
. But outputted a warning:
gawk: warning: escape sequence `\B' treated as plain `B'
gawk: warning: escape sequence `\,' treated as plain `,'
And the desired result.csv
will remove repeated lines, like:
t1,t2,t3,t4
field without comma,f02,f03,f04
field, with comma,f12,f13,f14
field without comma,f22,f23,f24