You can use awk command to do a search/replace only inside the "
quoted parts.
The first step is to replace the ,
by _
cat demo.txt | awk 'BEGIN{FS=OFS="\""} {for(i=2;i<NF;i+=2)gsub(",","_",$i)} 1'
which gives
3, "hh_1_foo", foo
"5___5", "1_2_3d___something ", foo2
test, "col3", foo3
Then replace the ,
by ;
with the more usual tr command.
tr ',' ';'
The last step uses awk again in a "reverse" way to replace the temporary _
placeholder into the initial ,
character.
Putting everything together we have:
cat demo.txt |
awk 'BEGIN{FS=OFS="\""} {for(i=2;i<NF;i+=2)gsub(",","_",$i)} 1' |
tr ',' ';' |
awk 'BEGIN{FS=OFS="\""} {for(i=2;i<NF;i+=2)gsub("_",",",$i)} 1'
which gives
3; "hh,1,foo"; foo
"5,,,5"; "1,2,3d,,,something "; foo2
test; "col3"; foo3
as expected.
UPDATE: the fastest solution?
I use the 3 answers I got to bench them on a 206Mb csv file (with several runs to take care of cache effect...), here are the typical results I get:
1/ My initial answer:
time cat avec_vapeur.csv | awk 'BEGIN{FS=OFS="\""} {for(i=2;i<NF;i+=2)gsub(",","_",$i)} 1' | tr ',' ';' | awk 'BEGIN{FS=OFS="\""} {for(i=2;i<NF;i+=2)gsub("_",",",$i)} 1' > /dev/null
real 0m2.488s
user 0m5.025s
sys 0m0.242s
2/ The alternative awk based solution: ravindersingh13
time cat avec_vapeur.csv | awk -F"\"" '{for(i=1;i<=NF;i+=2){gsub(/,/,";",$i)}} 1' OFS="\"" > /dev/null
real 0m4.705s
user 0m4.631s
sys 0m0.111s
3/ The sed based solution: sjsam
time cat avec_vapeur.csv | sed -E 's/,([[:space:]]*")/;\1/g;s/("[[:space:]]*),/\1;/g' > /dev/null
real 0m0.174s
user 0m0.118s
sys 0m0.130s
-> The clear winner is the sed based solution!
The last answer I got: inian
time cat avec_vapeur.csv | awk -v OFS=';' 'BEGIN{FPAT = "([^,]+)|([[:space:]]*\"[^\"]+\")"}{$1=$1}1' > /dev/null
real 0m37.507s
user 0m37.463s
sys 0m0.122s
which is also the slowest I tested (no judgement here, just done these tests for fun!)
update: I initially misread =inian=, sorry. If I understand you well, I add
LC_ALL=C
to speed up things.
Now I get:
real 0m20.268s
user 0m20.008s
sys 0m0.087s
which is faster but not as fast as the sed solution.
Now game is over, no more bench from me (I have to work a little bit too)
Last words for the winner, perl solution: sjsam
time cat avec_vapeur.csv | perl -ane 's/,(\s*"[^"]*"\s*),/;$1;/g;print' > /dev/null
real 0m0.134s
user 0m0.096s
sys 0m0.104s
which is even slightly faster than the sed one (at least with my tests)!