0

I have a csv file containing:

# Director, Movie Title, Year, Comment

Ethan Coen, No Country for Old Men, 2007, none

Ethan Coen, "O Brother, Where Art Thou?", 2000, none

Ethan Coen, The Big Lebowski, 1998, "uncredited (with his brother, Joel)"

I want to change the field separator from "," to "|" but I don't want to change the the comma if it's in a quoted string: so the result should be like:

# Director| Movie Title| Year| Comment

Ethan Coen| No Country for Old Men| 2007| none

Ethan Coen| "O Brother, Where Art Thou?"| 2000| none

Ethan Coen| The Big Lebowski| 1998| "uncredited (with his brother, Joel)"

I tried this but the output I get is : sed -e 's/(".)(.")/|\1 \2/g'

This is the result I am getting so far

Ethan Coen, |"O Brother, Where Art Thou? ", 2000, none

Ethan Coen, The Big Lebowski, 1998, |"uncredited (with his brother, Joel) "

AndyG
  • 39,700
  • 8
  • 109
  • 143
sbarboza
  • 72
  • 1
  • 8

2 Answers2

0

Approach: Change the quoted commas in \r, replace the remaining commas and change \r back. The first attempt works with the given input, but is still wrong:

# Wrong
sed -E 's/("[^,]*),([\"]*)/\1\r\2/g; s/,/|/g;s/\r/,/g' file

It fails on lines with 2 commas in one field.
The first replacement should be repeated until all quoted commas are replaced:

sed -E ':a;s/("[^,"]*),([^"]*)"/\1\r\2"/g; ta; s/,/|/g;s/\r/,/g' file
Walter A
  • 19,067
  • 2
  • 23
  • 43
0

This might work for you (GNU sed):

sed -E 's#"[^"]*"#$(echo &|sed "y/,/\\n/;s/.*/\\\"\&\\\"/")#g;s/.*/echo "&"/e;y/,\n/|,/' file

The substitution translates ,'s between double quotes into newlines, then translates ,'s to |'s and \n's to ,'s.

potong
  • 55,640
  • 6
  • 51
  • 83