0

This line is from a car dataset (https://archive.ics.uci.edu/ml/datasets/Auto+MPG) looking like this:

15.0   8.   429.0      198.0      4341.      10.0   70.  1.     "ford galaxie 500"

how would one replace the multiple whitespace (it has both space and tabs) w/ a single comma, but not inside the quotes, preferably using sed,to turn the dataset into a REAL csv. Thanks!

importError
  • 53
  • 1
  • 1
  • 6
  • Maybe this will help: http://stackoverflow.com/questions/14916159/sed-replace-spaces-within-quotes-with-underscores – John Zwinck Jan 21 '15 at 07:30
  • what do you already try that failed ? – NeronLeVelu Jan 21 '15 at 08:13
  • I tried, $ sed 's/[^"] [^"]//g' data/auto-mpg.data-original $ sed 's/[^"][ \t][^"]/,/g' data/auto-mpg.data-original $ sed 's/[^"][ \t]*[^"]/,/g' data/auto-mpg.data-original $ sed 's/[^"][ \t][^"]/,/g' data/auto-mpg.data-original $ sed 's/[ \t]/,/g;s/,,,//g' data/auto-mpg.data $ sed 's/[ \t]/,/g' data/auto-mpg.data $ perl -pe 's/"(.+?[^\\])"/($ret = $1) =~ (s#,##g); $ret/ge' data/auto-mpg.data $ sed 's/\(.*"\),/\1 /' data/auto-mpg.data $ sed 's/\(.*\"\),/\1 /g' data/auto-mpg.data-commad – importError Jan 21 '15 at 08:24

2 Answers2

6

Do it with awk:

awk -F'"' 'BEGIN { OFS="\"" } { for(i = 1; i <= NF; i += 2) { gsub(/[ \t]+/, ",", $i); } print }' filename.csv

Using " as the field separator, every second field is going to be a part of the line where spaces should be replaced. Then:

BEGIN { OFS = FS }               # output should also be separated by "
{
  for(i = 1; i <= NF; i += 2) {  # in every second field
    gsub(/[ \t]+/, ",", $i)      # replace spaces with commas
  }
  print                          # and print the whole shebang
}
Wintermute
  • 42,983
  • 5
  • 77
  • 80
0

This might work for you (GNU sed):

sed 's/\("[^"]*"\|[0-9.]*\)\s\s*/\1,/g' file

This takes a quoted string or a decimal number followed by white space and replaces the white space by a comma - throughout each and every line.

To be less specific use (as per comments):

sed -r 's/("[^"]*"|\S+)\s+/\1,/g' file
potong
  • 55,640
  • 6
  • 51
  • 83
  • That confused some of my test inputs, and I drew false conclusions at first (sorry about the first comment). There's a typo in your pattern: the closing paren should be escaped, and may I suggest to replace `[0-9.]` with `[^[:space:]]` to make it work with non-numeric unquoted tokens? That is: `s/\("[^"]*"\|[^[:space:]]*\)\s\s*/\1,/g` – Wintermute Jan 21 '15 at 10:11
  • Thank you for your answer, i will let you know after getting the chance to try it out. Thanks also to whoever pointed out using sed instead of awk would be silly. – importError Jan 23 '15 at 04:22