Identify and replace selective space inside given text file

Question

I am new to sed and its functioning. I need to selectively replace space with "," in a file where the content of the file is as follows. I do not want replace space inside "" but all the other spaces needs to be replaced.

File Content

my data "this is my very first encounter with sed"  "valuable" - - "c l e a r"

Used Pattern using sed to replace space with "," - Patten is 's/ /,/g'

Actual Output

my,data,"this,is,my,very,first,encounter,with,sed",,"valuable",-,-,"c,l,e,a,r"

Expected Output

my,data,"this is my very first encounter with sed",,"valuable",-,-,"c l e a r"

While it is "possible" in sed, don't. Write a proper CSV parser in a different easier programming language. — KamilCuk, Jul 26 '20 at 12:33

score 1 · Accepted Answer · answered Jul 26 '20 at 12:59

The following sed script with comments with input from bash here string:

<<<'my data "this is my very first encounter with sed"  "valuable" - - "c l e a r"' sed -E '
    # Split input with each character on its own line
    s/./&\n/g;
    # Add a newline on the end to separate output from input
    s/$/\n/;
    # Each line has one character
    # Add a leading character that stores "state"
    # There are two states available - in quoting or not in quoting
    # The state character is space when we are not in quotes
    # The state character is double quote when we are in quotes
    s/^/ /;
    # For each character in input
    :again; {
        # Substitute a space that is not in quotes for a comma
        s/^  / ,/

        # When quotes is encountered and we are not in quotes
        /^ "/{
            # Change state to quotes
            s//""/
            b removed_quotes
        } ; {
            # When quotes is encountered and we are in quotes
            # then we are no longer in quotes
            s/^""/ "/
        } ; : removed_quotes

        # Preserve state as the first character
        # Add the parsed character to the output on the end
        # Preserve the rest
        s/^(.)(.)\n(.*)/\1\3\2/;
        # If end of input was not reached, then parse another character.
        /^.\n/!b again;
    };
    # Remove the leading state character with the newline
    s///;
'

outputs:

my,data,"this is my very first encounter with sed",,"valuable",-,-,"c l e a r"

And a oneliner, because who reads these comments:

sed -E 's/./&\n/g;s/$/\n/;s/^/ /;:a;s/^  / ,/;/^ "/{s//""/;bq;};s/^""/ "/;:q;s/^(.)(.)\n(.*)/\1\3\2/;/^.\n/!ba;s///'

I think a newline \n in s command replacement string is an extension not required by posix. Another unique character may be used instead of a newline to separate input while parsing. Anyway I tested that with GNU sed.

wrt `I think a newline ...` - the only 2 seds that have a `-E` arg are GNU and OSX/BSD, the former will work with `\n`, the latter won't, so yes it's GNU sed only. There's almost certainly other GNU-isms in there too. — Ed Morton, Jul 26 '20 at 13:29

score 1 · Answer 2 · answered Jul 26 '20 at 13:00

As mentioned in the comments, this is something better suited for an actual CSV parser instead of trying to kludge up something using regular expressions - especially sed's rather basic regular expressions.

A one-liner in perl using the useful Text::AutoCSV module (Install through your OS package manager or favorite CPAN client):

$ perl -MText::AutoCSV -e 'Text::AutoCSV->new(sep_char=>" ", out_sep_char=>",")->write' < input.txt
my,data,"this is my very first encounter with sed",,valuable,-,-,"c l e a r"

Ed Morton · Answer 3 · 2020-07-26T13:26:06.187

With GNU awk for FPAT:

$ awk -v FPAT='[^ ]*|"[^"]+"' -v OFS=',' '{$1=$1} 1' file
my,data,"this is my very first encounter with sed",,"valuable",-,-,"c l e a r"

Your input is a CSV where C in this case means "Character" instead of the traditional "Comma" and where the Character in question is a blank and you're just trying to convert it to a Comma-separated CSV. See What's the most robust way to efficiently parse CSV using awk? for more information on what the above does and on parsing CSVs with awk in general.

score 0 · Answer 4 · answered Jul 27 '20 at 04:07

0

awk 'BEGIN {RS=ORS="\""} NR%2 {gsub(" ",",")} {print}' file

At the beginning, set the double quote as the record separator.
For odd records, i.e. outside quotes, replace globally any space with comma.
print every record.

answered Jul 27 '20 at 04:07

thanasisp

5,855
3
14
31

score 0 · Answer 5 · answered Jul 28 '20 at 12:38

This might work for you (GNU sed):

sed -E ':a;s/^((("[^"]*")*[^" ]*)*) /\1,/;ta' file

Replace, the group of zero or more double quoted strings followed by zero or more non-space characters zero or more time followed by a space with the group followed by a comma, repeated until failure.

Identify and replace selective space inside given text file

5 Answers5

Linked