0

I have a CSV file with a comma delimiter and need to remove this delimiter in string to manipulate the data.

The file is like :
Col1, Col2, Col3, Col4
100, 08 sciences, "xx, yy, ww", US
110, 06 culture, "abc, ww", US
115, 05 geology, "geology", CA

I need to scan the string to replace the delimiter but don't know how to proceed. I've tried sed 's/","/" "/g' (using Mac OS 10.10)

So when I'm trying to keep some column I can't reach the expected result : cut -d , -f 2,3 myfile.csv > newfile.csv

I got :
08 sciences, "xx
06 culture, "abc
05 geology, "geology"

Instead of
08 sciences, "xx, yy, ww"
06 culture, "abc, ww"
05 geology, "geology"

Any help please ?

jahina97
  • 1
  • 2
  • Is the troublesome field always known - I mean, can you say at the start it is field 3 this time? Is there just one troublesome field, or can there be two fields in double quotes on one line? Consider adding `awk` to your tags and you'll get an answer faster... – Mark Setchell Feb 26 '18 at 21:07
  • https://stackoverflow.com/questions/29642102/how-to-make-awk-ignore-the-field-delimiter-inside-double-quotes might help – Sundeep Feb 27 '18 at 04:36
  • @Mark Setchell To answer your question yes the problem is only on field 3 where there is text. – jahina97 Feb 27 '18 at 04:57
  • 1
    @Sundeep Thanks, I gonna take a look – jahina97 Feb 27 '18 at 05:00

4 Answers4

0

When you want 2nd and third column, use

sed 's/[^,]*,//;s/,[^,]*//' inputfile
# or
sed 's/[^,]*,\(.*\),.*/\1/' inputfile
# or removing space before col3
sed 's/[^,]*, *\(.*\),.*/\1/' inputfile
Walter A
  • 19,067
  • 2
  • 23
  • 43
  • I tried your script but the first column has been squeezed. I don't know why. Thus the delimiter is still in the string. Does your script just make a scan to read the entire string as a column from first quote until next ? – jahina97 Feb 27 '18 at 05:10
  • Perhaps you have `\r` in your file. You can look for `^M` with `cat -v inputfile` and remove them with `sed 's/\r//' inputfile` (option -i or redirect to other file). – Walter A Feb 27 '18 at 07:34
  • I already did that but couldn't remove the CR. I guess it's because of macos. Gonna try on Windows. – jahina97 Feb 27 '18 at 16:21
0

Assuming your third column would consist of " "

$ awk -F"," -v OFS="," '{match($0,/\".*\"/,a); print $2 OFS a[0]}' file
 08 sciences,"xx, yy, ww"
 06 culture,"abc, ww"
 05 geology,"geology"
Rahul Verma
  • 2,946
  • 14
  • 27
0

I think you want to replace commas with spaces in the field inside double quotes.

If so, use awk and tell it that the fields are separated by double quotes, and that you want to replace commas with spaces in field 2, like this:

awk -F'"' '{gsub(/,/," ",$2)}1' OFS='"' yourFile

Sample Output

100, 08 sciences, "xx  yy  ww", US
110, 06 culture, "abc  ww", US
115, 05 geology, "geology", CA
Mark Setchell
  • 191,897
  • 31
  • 273
  • 432
0
awk 'NR>1{gsub(/^.....|....$/,"");print}' file

08 sciences, "xx, yy, ww"
06 culture, "abc, ww"
05 geology, "geology"
Claes Wikner
  • 1,457
  • 1
  • 9
  • 8