2

So, i have a csv file with multiple lines like

"ABC-DEF-d98263","12345678","176568981","","588","ABC-DEF-11947","","GAUZE PACKING STRIPS 1/4"","","","2019-02-04T19:09:00-05:00","","XXX","XXX","2019-02-12T23:57:48-06:00","XXX-XXX-176568981"
"ABC-DEF-d1494751","98765432","98765432","1073552394","284","ABC-DEF-77997","","ACE WRAP 3"","","","2015-10-29T18:45:00-07:00","Sent","XXX","XXX","2018-04-05T19:38:41-05:00","XXX-XXX-76954940"

I would like to replace the "", with ", only for the column 8 or where its after GAUZE PACKING STRIPS 1/4, or ACE WRAP 3 without touching other "", in the line.

Have tried sed 's/[[:alnum:]]""//g' file.csv but it removes <num>"" as well.

Any ideas? Much appreciated!

Jotne
  • 40,548
  • 12
  • 51
  • 55
  • 1
    Looks like the quotes you have issues with actually denote inches (`3/4"` and `3"`). You may want to take a look at https://stackoverflow.com/questions/17808511/ before you go down the route as suggested by your question. – Happy Green Kid Naps Oct 19 '19 at 03:17

2 Answers2

2

You can use capture groups to match and replace anything that is between double quotes and followed immediately by double quotes.

The regex to match would look something like this: ("[^",]*")". Note two things: the first one is that " are matched literally and the expression in the middle [^",]* just means that the regex will match anything except a " or a ,. This means it will prevent the matched string from having a quote inside.

Lastly, the parenthesis are a capture group and we can reference anything that matched the sub-regex between the () with a backslash and a number. For example, \1 will be replaced by the match of the first capture group, \3 with the third and so on.

The sed script for what you need may look something like this:

sed -re 's/("[^",]*")"/\1/g'

See how the last double quote is outside the capture group, and it will not be replaced with the \1.

Capture groups are a feature of Extended Regular Expressions (ERE), so the flag -r is needed to enable them in sed, otherwise it will use Basic Regular Expressions (BRE).

Notice also the /g at the end. This is needed for sed to be able to match and replace more than one occurrence in the same line.

Example:

$ cat test
"ABC-DEF-d98263","12345678","176568981","","588","ABC-DEF-11947","","GAUZE PACKING STRIPS 1/4"","","","2019-02-04T19:09:00-05:00",""","XXX","XXX","2019-02-12T23:57:48-06:00"","XXX-XXX-176568981"
$ cat test | sed -re 's/("[^",]*")"/\1/g'
"ABC-DEF-d98263","12345678","176568981","","588","ABC-DEF-11947","","GAUZE PACKING STRIPS 1/4","","","2019-02-04T19:09:00-05:00","","XXX","XXX","2019-02-12T23:57:48-06:00","XXX-XXX-176568981"
  • This one worked really well! Thank you :) Since I was working with a lot of files, I added an extra parameter to keep it silent and make the changes infile. `sed -i -re 's/("[^",]*")"/\1/g' file.csv` – Rituraj Golawar Oct 20 '19 at 21:43
0

Using awk:

$ awk '
BEGIN { FS=OFS="," }           # set delimiters
{
    if($8!="\"\"")             # if $8 is not empty ie. ""
        sub(/\"\"$/,"\"",$8)   # replace trailing double quotes with a single double quote
}1' file                       # output

Output:

"ABC-DEF-d98263","12345678","176568981","","588","ABC-DEF-11947","","GAUZE PACKING STRIPS 1/4","","","2019-02-04T19:09:00-05:00","","XXX","XXX","2019-02-12T23:57:48-06:00","XXX-XXX-176568981"
"ABC-DEF-d1494751","98765432","98765432","1073552394","284","ABC-DEF-77997","","ACE WRAP 3","","","2015-10-29T18:45:00-07:00","Sent","XXX","XXX","2018-04-05T19:38:41-05:00","XXX-XXX-76954940"
James Brown
  • 36,089
  • 7
  • 43
  • 59