0

i have a file like this:

"1","ab,c","def"

so only use comma a field delimiter will get wrong result, so i want to use "," as field delimiter, i tried like this:

awk -F "," '{print $0}' file

or like this:

awk -F "","" '{print $0}' file

or like this:

awk -F '","' '{print $0}' file

but the result is incorrect, don't know how to include "" as part of the field delimiter itself,

Marcus Müller
  • 34,677
  • 4
  • 53
  • 94
tonyibm
  • 581
  • 2
  • 8
  • 24
  • `-F '","'` seems to work (as in split at literal `","`, not magically becoming a ‘quote-aware parser’). How are you testing it if you only `print $0`, though?! – Biffen Nov 10 '21 at 10:27
  • You don't want to use `"` as part of the field delimiter, you want your program to "lex" (as in: to divide into separate "tokens") respecting quotation marks, that's a different problem! – Marcus Müller Nov 10 '21 at 10:27
  • See [Escaping separator within double quotes, in awk](https://stackoverflow.com/q/7804673/3832970) – Wiktor Stribiżew Nov 10 '21 at 10:27
  • '","' do not work, $0 in my post is a typo, – tonyibm Nov 10 '21 at 10:32
  • 2
    `echo '"1","ab,c","def"' | awk -F '","' '{print $1}'` works for me and prints `"1` correctly. If you want to parse a CSV file with quotes, consider not reinventing the wheel and use some existing parser or library? My 5 min google serach https://stackoverflow.com/questions/3138363/can-awk-deal-with-csv-file-that-contains-comma-inside-a-quoted-field https://coderwall.com/p/mplocg/simple-quote-comma-csv-parsing-in-awk https://github.com/geoffroy-aubry/awk-csv-parser . But really use csvtool – KamilCuk Nov 10 '21 at 10:37
  • ok, didn't realize there is existing tools, – tonyibm Nov 10 '21 at 10:39

3 Answers3

0

If you can handle GNU awk, you could use FPAT:

$ echo '"1","ab,c","def"' |        # echo outputs with double quotes
gawk '                             # use GNU awk
BEGIN {
    FPAT="([^,]*)|(\"[^\"]+\")"    # because FPAT
}
{
    for(i=1;i<=NF;i++)             # loop all fields
        gsub(/^"|"$/,"",$i)        # remove leading and trailing double quotes
    print $2                       # output for example the second field
}'

Output:

ab,c

FPAT cannot handle RS inside the quotes.

James Brown
  • 36,089
  • 7
  • 43
  • 59
0

What you are attempting seems misdirected anyway. How about this instead?

awk '/^".*"$/{ sub(/^\"/, ""); sub(/\"$/, ""); gsub(/\",\", ",") }1'

The proper solution to handling CSV files with quoting in them is to use a language which has an actual CSV parser. My thoughts go to Python, which includes a csv module in its standard library.

tripleee
  • 175,061
  • 34
  • 275
  • 318
0

In GNU AWK

{print $0}

does print whole line, if no change were made original line is printed, no matter what field separator you set you will get original lines if only action is print $0. Use $1=$1 to trigger string rebuild.

If you must do it via FS AT ANY PRICE, then you might do it as follows: let file.txt content be

"1","ab,c","def"

then

BEGIN{FS="\x22,?\x22?"}{$1=$1;print $0}

output

 1 ab,c def

Note leading space (ab,c is $3). Explanation: I inform GNU AWK that field separator is literal " (\x22, " is 22(hex) in ASCII) followed by zero or one (?) , followed by zero or one (?) literal " (\x22). $1=$1 trigger line rebuilt as mentioned earlier. Disclaimer: this solution assume that you never have escaped " inside your string,

(tested in gawk 4.2.1)

Daweo
  • 31,313
  • 3
  • 12
  • 25