0

I have the following CSV file:

ID,COUNTRY,DESCRIPTION,PRICE
1,USA,Short description,11
2,UK,"Description with comma , inside",2.3

So if field contains comma, it is put in double quotes.

How to replace delimiter from comma to tabulator but ignore commas in quotes? So I will have:

ID    COUNTRY    DESCRIPTION    PRICE
1    USA    Short description    11
2    UK    "Description with comma , inside"    2.3

or without double quotes:

ID    COUNTRY    DESCRIPTION    PRICE
1    USA    Short description    11
2    UK    Description with comma , inside    2.3

I use this code before I noticed that some fields contain commas:

$VAR='\t'
sed -i $"s/,/$VAR/" $FILE_NAME
RavinderSingh13
  • 130,504
  • 14
  • 57
  • 93
user1632454
  • 41
  • 1
  • 8
  • 1
    On So its highly encouraged to add efforts in questions. So kindly do add your efforts in form of code in your question and let us know then. – RavinderSingh13 Aug 31 '20 at 13:54
  • 1
    it was mistake, edited – user1632454 Aug 31 '20 at 13:59
  • Look into `csvkit` tools. – Shawn Aug 31 '20 at 19:42
  • 1
    Don't you need to convert tabs in your data to blanks or something else so you don't end up with more fields than you started with? See also [whats-the-most-robust-way-to-efficiently-parse-csv-using-awk](https://stackoverflow.com/questions/45420535/whats-the-most-robust-way-to-efficiently-parse-csv-using-awk). – Ed Morton Sep 01 '20 at 12:02

3 Answers3

5

With GNU awk, could you please try following written with shown samples(In case you want to make edit within Input_file itself then append > temp && mv temp Input_file in following solutions too).

awk -v FPAT='[^,]*|"[^"]+"' -v OFS="\t" '{$1=$1}1' Input_file

OR to remove " from starting and ending of the fields of line then run following.

awk -v FPAT='[^,]*|"[^"]+"' -v OFS="\t" '
{
  for(i=1;i<=NF;i++){
    gsub(/^"|"$/,"",$i)
  }
  $1=$1
}
1
' Input_file
RavinderSingh13
  • 130,504
  • 14
  • 57
  • 93
  • 1
    thanks for this answer, that FPAT looks to be exactly what I needed...unfortunately it does not work in our system, because we have lower awk installed (< 4.0) – user1632454 Sep 01 '20 at 06:38
  • @user1632454, sure no issues cheers and happy learning. – RavinderSingh13 Sep 01 '20 at 07:07
  • 1
    @user1632454 you should update your gawk version as you're about 10 years out of date and missing a few bug fixes and a ton of extremely useful new functionality. – Ed Morton Sep 01 '20 at 12:03
3

Ruby has a nice csv module:

ruby -rcsv -e '
  out = CSV.new($stdout, col_sep: "\t")
  CSV.foreach(ARGV.shift) {|row| out << row}
' file.csv

Will automatically quote output fields with tabs:

$ cat -A file.csv
ID,COUNTRY,DESCRIPTION,PRICE$
1,USA,Short description,11$
2,UK,"Description with comma , inside",2.3$
3,CA,"Description with^Itab inside",2.3$

$ ruby -rcsv -e '
  out = CSV.new($stdout, col_sep: "\t")
  CSV.foreach(ARGV.shift) {|row| out << row}
' file.csv
ID  COUNTRY DESCRIPTION PRICE
1   USA Short description   11
2   UK  Description with comma , inside 2.3
3   CA  "Description with   tab inside" 2.3
glenn jackman
  • 238,783
  • 38
  • 220
  • 352
3

This might work for you (GNU sed):

sed -E ':a;s/^([^"]*("[^",]*"[^"]*)*"[^"]*),/\1\n/;ta;y/,/\t/;y/\n/,/' file

Convert all commas inside double quotes to newlines, convert remaining commas to tabs (or whatever) and then convert newlines back to commas.

N.B. This will not work if the new delimiter is a newline or if there are quoted double quotes inside double quotes.

potong
  • 55,640
  • 6
  • 51
  • 83