sed - Replace comma after first regex match

Question

i m trying to perform the following substitution on lines of the general format:

BBBBBBB.2018_08,XXXXXXXXXXXXX,01/01/2014,"109,07",DF,CCCCCCCCCCC, .......

as you see the problem is that its a comma separated file, with a specific field containing a comma decimal. I would like to replace that with a dot .

I ve tried this, to replace the first occurence of a pattern after match, but to no avail, could someone help me?

sed -e '/,"/!b' -e "s/,/./"
sed -e '/"/!b' -e ':a' -e "s/,/\./"

Thanks in advance. An awk or perl solution would help me as well. Here's an awk effort:

gawk -F "," 'substr($10, 0, 3)==3 && length($10)==12 { gsub(/,/,".", $10); print}'

That yielded the same file unchanged.

score 3 · Accepted Answer · answered May 20 '19 at 10:04

CSV files should be parsed in awk with a proper FPAT variable that defines what constitutes a valid field in such a file. Once you do that, you can just iterate over the fields to do the substitution you need

gawk 'BEGIN { FPAT = "([^,]+)|(\"[^\"]+\")"; OFS="," } 
       { for(i=1; i<=NF;i++) if ($i ~ /[,]/) gsub(/[,]/,".",$i);}1' file

See this answer of mine to understand how to define and parse CSV file content with FPAT variable. Also see Save modifications in place with awk to do in-place file modifications like sed -i''.

You don't need `if ($i ~ /[,]/)` - the `gsub(/[,]/...)` just won't do anything if there's no match for `[,]`. Also that will delete empty fields - you should use `*`s instead of `+`s in the `FPAT` setting. — Ed Morton, May 20 '19 at 13:23

score 1 · Answer 2 · answered May 20 '19 at 10:49

1

The following sed will convert all decimal separators in quoted numeric fields:

 sed 's/"\([-+]\?[0-9]*\)[,]\?\([0-9]\+\([eE][-+]\?[0-9]+\)\?\)"/"\1.\2"/g'

See: https://www.regular-expressions.info/floatingpoint.html

answered May 20 '19 at 10:49

kvantour

25,269
4
47
72

score 1 · Answer 3 · answered May 20 '19 at 12:49

This might work for you (GNU sed):

sed -E ':a;s/^([^"]*("[^",]*"[^"]*)*"[^",]*),/\1./;ta' file

This regexp matches a , within a pair of "'s and replaces it by a .. The regexp is anchored to the start of the line and thus needs to be repeated until no further matches can be matched, hence the :a and the ta commands which causes the substitution to be iterated over whilst any substitution is successful.

N.B. The solution expects that all double quotes are matched and that no double quotes are quoted i.e. \" does not appear in a line.

score 1 · Answer 4 · answered May 20 '19 at 13:48

If your input always follows that format of only one quoted field containing 1 comma then all you need is:

$ sed 's/\([^"]*"[^"]*\),/\1./' file
BBBBBBB.2018_08,XXXXXXXXXXXXX,01/01/2014,"109.07",DF,CCCCCCCCCCC, .......

If it's more complicated than that then see What's the most robust way to efficiently parse CSV using awk?.

score 0 · Answer 5 · answered May 20 '19 at 10:23

0

Assuming you have this: BBBBBBB.2018_08,XXXXXXXXXXXXX,01/01/2014,"109,07",DF,CCCCCCCCCCC

Try this: awk -F',' '{print $1,$2,$3,$4"."$5,$6,$7}' filename | awk '$1=$1' FS=" " OFS=","

Output will be: BBBBBBB.2018_08,XXXXXXXXXXXXX,01/01/2014,"109.07",DF,CCCCCCCCCCC

You simply need to know the field numbers for replacing the field separator between them.

answered May 20 '19 at 10:23

Skeptic Biped

92
1
8

1

Why don't you define `OFS=","` in the first awk? With your current solution, you will destroy all spaces which could be located in strings. – kvantour May 20 '19 at 10:26
I wanted to keep it clean and simple. Also, I assumed no white spaces other than the field separators. – Skeptic Biped May 20 '19 at 10:32
But if you would define `OFS` directly, you get rid of a single command: `awk '{print $1,$2,$3,$4"."$5,$6,$7}' FS="," OFS="," file` – kvantour May 20 '19 at 10:34
Well, you can do that too! Thanks for your input. – Skeptic Biped May 20 '19 at 10:37

score 0 · Answer 6 · answered May 20 '19 at 10:30

In order to use regexp as in perl you have to activate extended regular expression with -r. So if you want to replace all numbers and omit the " sign, then you can use this:

echo 'BBBBBBB.2018_08,XXXXXXXXXXXXX,01/01/2014,"109,07",DF,CCCCCCCCCCC, .......'|sed -r 's/\"([0-9]+)\,([0-9]+)\"/\1\.\2/g'

If you want to replace first occurrence only you can use that:

echo 'BBBBBBB.2018_08,XXXXXXXXXXXXX,01/01/2014,"109,07",DF,CCCCCCCCCCC, .......'|sed -r 's/\"([0-9]+)\,([0-9]+)\"/\1\.\2/1'

https://www.gnu.org/software/sed/manual/sed.txt

sed - Replace comma after first regex match

6 Answers6