Change delimiter in not regular CSV

Question

I have the following CSV file:

ID,COUNTRY,DESCRIPTION,PRICE
1,USA,Short description,11
2,UK,"Description with comma , inside",2.3

So if field contains comma, it is put in double quotes.

How to replace delimiter from comma to tabulator but ignore commas in quotes? So I will have:

ID    COUNTRY    DESCRIPTION    PRICE
1    USA    Short description    11
2    UK    "Description with comma , inside"    2.3

or without double quotes:

ID    COUNTRY    DESCRIPTION    PRICE
1    USA    Short description    11
2    UK    Description with comma , inside    2.3

I use this code before I noticed that some fields contain commas:

$VAR='\t'
sed -i $"s/,/$VAR/" $FILE_NAME

On So its highly encouraged to add efforts in questions. So kindly do add your efforts in form of code in your question and let us know then. — RavinderSingh13, Aug 31 '20 at 13:54
Don't you need to convert tabs in your data to blanks or something else so you don't end up with more fields than you started with? See also [whats-the-most-robust-way-to-efficiently-parse-csv-using-awk](https://stackoverflow.com/questions/45420535/whats-the-most-robust-way-to-efficiently-parse-csv-using-awk). — Ed Morton, Sep 01 '20 at 12:02

RavinderSingh13 · Answer 1 · 2020-08-31T14:05:44.800

5

With GNU awk, could you please try following written with shown samples(In case you want to make edit within Input_file itself then append > temp && mv temp Input_file in following solutions too).

awk -v FPAT='[^,]*|"[^"]+"' -v OFS="\t" '{$1=$1}1' Input_file

OR to remove " from starting and ending of the fields of line then run following.

awk -v FPAT='[^,]*|"[^"]+"' -v OFS="\t" '
{
  for(i=1;i<=NF;i++){
    gsub(/^"|"$/,"",$i)
  }
  $1=$1
}
1
' Input_file

edited Aug 31 '20 at 14:05

answered Aug 31 '20 at 14:00

RavinderSingh13

130,504
14
57
93

1

thanks for this answer, that FPAT looks to be exactly what I needed...unfortunately it does not work in our system, because we have lower awk installed (< 4.0) – user1632454 Sep 01 '20 at 06:38
@user1632454, sure no issues cheers and happy learning. – RavinderSingh13 Sep 01 '20 at 07:07
1

@user1632454 you should update your gawk version as you're about 10 years out of date and missing a few bug fixes and a ton of extremely useful new functionality. – Ed Morton Sep 01 '20 at 12:03

score 3 · Answer 2 · answered Aug 31 '20 at 14:16

Ruby has a nice csv module:

ruby -rcsv -e '
  out = CSV.new($stdout, col_sep: "\t")
  CSV.foreach(ARGV.shift) {|row| out << row}
' file.csv

Will automatically quote output fields with tabs:

$ cat -A file.csv
ID,COUNTRY,DESCRIPTION,PRICE$
1,USA,Short description,11$
2,UK,"Description with comma , inside",2.3$
3,CA,"Description with^Itab inside",2.3$

$ ruby -rcsv -e '
  out = CSV.new($stdout, col_sep: "\t")
  CSV.foreach(ARGV.shift) {|row| out << row}
' file.csv
ID  COUNTRY DESCRIPTION PRICE
1   USA Short description   11
2   UK  Description with comma , inside 2.3
3   CA  "Description with   tab inside" 2.3

score 3 · Accepted Answer · answered Aug 31 '20 at 14:54

This might work for you (GNU sed):

sed -E ':a;s/^([^"]*("[^",]*"[^"]*)*"[^"]*),/\1\n/;ta;y/,/\t/;y/\n/,/' file

Convert all commas inside double quotes to newlines, convert remaining commas to tabs (or whatever) and then convert newlines back to commas.

N.B. This will not work if the new delimiter is a newline or if there are quoted double quotes inside double quotes.

Change delimiter in not regular CSV

3 Answers3